University of Essex staff research data survey

We have now analysed the results of the staff survey held amongst academic staff of the University of Essex

The aim of this survey was to:

  • raise and assess awareness amongst research staff about data management planning requirements and practices;
  • assess current practices and support needs with regards data management and sharing planning;
  • assess current data sharing and publishing levels by researchers;
  • assess whether researchers have a need for a data repository at the university;
  •  elicit feedback and testing of the developed pilot data repository “Essex Research Data”, to ensure that the repository is fit for purpose for the university’s research community.

All university academic and research staff were invited to complete the online survey.

The survey was developed based on research data management expertise at the UK Data Archive, and also drew inspiration from a recent similar research data management survey held at Oxford University by the DaMaRO project.

The survey questionnaire and a summary report of all responses charts accompany this report.

  • data management planning
  • storage and backup
  • ethical and legal aspects of data sharing
  • costing data management, preservation and sharing
  • Freedom of Information for research data
  • copyright and intellectual property rights
  • data formats, versioning and quality control for long-term preservation
  • security and controlled access to data

Overall, respondents are fairly familiar with and/or positive about sharing their research data. Forty-five percent of researchers have a funder requirement to share their research data. Twenty percent have legal requirements to do so. Twenty two percent have been required by a publisher to share or deposit research data and eighty one percent consider their data to be an important resource for future research and/or learning.

Forty five percent of researchers have placed data in a data repository, mainly because they are in favour of data sharing, because it is community practice in their research field, or due to a requirement from publishers or funders. The repositories or data centres used are:

  • UK Data Archive
  • British Oceanographic Data Centre
  • Harvard IQSS Dataverse
  • Interuniversity Consortium for Political and Social Research (ICPSR)
  • Journal supplementary information or repository
  • ArrayExpress
  • British Athmospheric Data Centre
  • Dryad
  • Gene Express Omnibus (GEO)
  • IRIS database
  • Norwegian Social Science Data Services
  • Pangaea
  • Project website
  • RCSB Protein Data Bank

Those researchers not used to data sharing are either not aware of any suitable data repository or centre (13%), consider their data not suitable for sharing (16%) or do not find the time to prepare data for sharing (11%).

Most respondents would consider sharing some of their data in future, either without restrictions, upon request or with known collaborators.

A quarter of respondents would like to deposit research data in a data repository at the University of Essex in future, but only 20 % are (very) likely to need to place data in a repository during 2013, for the following reasons:

  • for data to be used by the research community
  • to make data available
  • for impact and visibility
  • for research excellence purposes   
  • as funder requirement
  • as journal requirements
  • for long term storage
  • to promote collaboration

All the outputs of the research data survey are available below:

University of Essex data policy and guidance published

The University of Essex has published its Research Data Management Policy, its roadmap to policy implementation, and data management planning guidance developed by the Research Data @ Essex project.

http://www.essex.ac.uk/reo/research_community/research_governance/research_data/

 

 

University of Essex research data survey

We have just launched an institution-wide survey at the University of Essex, targeted at those generating data through primary research. The survey aims to examine current practices and needs at the University of Essex with regards data management, sharing and planning. A PDF version of the survey questions can be downloaded here.

In creating the survey, we drew on our extensive experience with research data management (RDM) issues at the UK Data Archive, particularly past awareness survey work. We also drew inspiration from other successful work in a similar vein, including Oxford’s recent RDM survey.

After a period of intensive development based on initial user testing, we have simultaneously released our research data repository pilot to the university. This is a great way to help support the case for a sustainably supported repository, and provides a means to gather feedback from real users. By allowing researchers to login with their university credentials and encouraging the upload of test data, we can examine the results of upload attempts in the pre-release review buffer. For example, which optional fields have been completed? How much information has been provided on the collection methodology?

The repository is still openly available to explore as an external user (though much of the real sample data is still not visible at this stage).

We will post the results of both the survey and the user testing here in early March.

Data management and sharing planning requirements of UK research funders, an update

On 14 January 2013, I presented an update on the UK landscape of research funder data policies and their data management and sharing planning requirements at the IDCC13 workshop ‘Data Management Planning: what’s happened, what’s happening and what’s coming next?

Very helpful were the following two references: 

The UK research funders that current have a data sharing policy and that require a data management and sharing plan as part of an applications are:

  • Arts and Humanities Research Council (AHRC)
  • Biotechnology and Biological Sciences Research Council (BBSRC)
  • Cancer Research UK (CRUK)
  • Department for International Development (DFID)
  • Economic and Social Research Council (ESRC)
  • Medical Research Council (MRC)
  • Natural Environment Research Council (NERC)
  • Science and Technology Facilities Council (STFC)
  • Wellcome Trust
Research funders that have a data sharing policy, but currently no data management planning requirements are: 
  • British Academy
  • Department of Health  
  • Engineering and Physical Sciences Research Council (EPSR)
  • Nuffield Foundation

Here is a brief overview of what each funder requires with an application, and the data topics they want to see described in a plan.

Funder Required at application Data topics in DMP
AHRC Technical plan Standards, preservation, continued access and use
BBSRC Data management and sharing plan Type, format, standards, sharing methods, restrictions, timeframe
CRUK Data sharing plan Volume, format, standards, metadata, documentation, sharing method, timescale, preservation, restrictions
DFID Access and data management plan Repositories, limits, timescale, responsibilities, resources, access strategy
EPSRC Policy framework (from 2015)
ESRC Data management plan Volume, type, quality, archiving plans, difficulties sharing, consent sharing, IPR, responsibilities
MRC Data management plan Collection methods, documentation, standards, preservation, curation, security, confidentiality, sharing & access, timescale, responsibilities
NERC Outline data management plan DM procedures, created data
STFC Data management plan Type, preservation, metadata, value, sharing, timescale, resources needed
Wellcome Trust Data management and sharing plan What data? When share? Where share? How access? Limits, how preserve? What resources?

My presentation is available on the IDCC13 workshop webpage.

Evaluating Data Management Plans

During the recent JISC Managing Research Data – Benefits & Evidence workshop in Bristol, 29-30 November 2012, we discussed how to evaluate the quality of data management plans (DMPs).

I promised to share our experiences of how in practice we evaluated DMPs for the Rural Economy and Land Use programme (Relu) and recently also for ESRC grant applications.

In an earlier blog post the evaluation methodology was described in more detail.

Relu DMPs

This cross-disciplinary research programme (2005-2012) had its own data policy, with all projects preparing a data management plan at the start of a project (post funding). At the end of research research data were deposited with the UK Data Archive and the Environmental Information Data Centre.

In a data management plan researchers described:

  • ( the need for access to existing data sources and any access limitations that may exist )
  • datasets planned to be produced by the research project
  • planned quality assurance and back-up procedures for data
  • plans for management and archiving of collected data
  • expected difficulties in making data available for re-use (through data archiving) and measures to overcome such difficulties
  • who holds copyright and intellectual property rights of the data
  • data management roles and responsibilities within the research team

We reviewed the data management plans prepared by 29 large projects, four pilot projects that created and archived data and three fellowships (totalling 36 plans). For each question we evaluated whether the information provided was insufficient (lacking clarity or detail), sufficient or excellent.

Datasets planned to be produced by the research project

Most plans contain sufficiently detailed lists of the various datasets planned to be produced. In a few cases information was vague and award holders were asked to provide better or more detailed information. For each dataset, the format/software in which data will be created or stored is specified and storage details are provided. Dependent on projects, storage may be solely on an institutional server or on a combination of server, PCs, institutional virtual environments and back-ups on movable media (CD, DVD, …).

During research projects, research activities may change and actual datasets produced at the end of a project can be different from those initially planned.

Planned quality assurance for data

 All plans include good information on how data quality will be ensured. Measures include

  • institutional quality assurance procedures, ISO standards
  • standard data collection protocols
  • standardised data recording (data entry sheets, validation rules in databases)
  • instrument calibration
  • recording metadata, labelling data
  • documenting methods and procedures
  • training researchers
  • pilot studies
  • double data entry
  • validation check, cross-checking
  • random checks
  • peer review of data
  • data record forms
  • file naming standards

Planned data back-up procedures for data

Overall the information provided within this section is excellent. Most data management plans describe institutional data storage and back-up procedures that are in place. Most projects store data on institutional servers, which guarantees regular back-up and transfers the responsibility to institutional IT staff.

Some projects mention additional back-ups researchers plan to carry out (e.g. onto disks or hard drives, or by sharing copies of data between partner institutions) or state that the principal investigator will hold a master copy of all data, besides data held on partner servers.

Three data management plans failed to incorporate information for partner institutions, only listing procedures at the host institute.

Only four projects have specific data management staff allocated to the project, which have a role in overseeing data storage and back-up procedures (besides other responsibilities).

Expected difficulties in making data available for re-use and measures to overcome difficulties

Only 14 plans provide excellent information on this topic; in 10 plans the information is sufficient, whereas in 12 plans the information is vague or contains only a simple statement that ‘no difficulties in making data available for secondary use are anticipated’. In six project where no problems to make data available for archiving were foreseen, researchers did not consider obtaining consent for data obtained through interviews or surveys to be shared, or collected data under unnecessarily strict confidentiality agreements. Data obtained through interviews / surveys could therefore not be archived due to confidentiality restrictions. Researchers thus tend to underestimate potential difficulties to archive and share data, especially for confidential, commercial or sensitive data.

Almost half the plans (17) state that data confidentiality, the inclusion of personal data in research data, and copyright of third party sources may limit the archiving of some research data, with overall valid reasons given. Confidentiality restrictions may be in place due to commercial confidentiality (e.g. business information for farms) or where interviewees are easily identifiable (e.g. public body stakeholders and policy makers). Copyright limitations exist mostly where research projects use licensed data sources within GIS systems, to create derived data or to model research scenarios. Use of OS data in GIS typically limits sharing even many derived data.

Only six plans then provide information on how such difficulties may be overcome by the researchers, e.g. by anonymising data, aggregating data, obtaining consent to share data, or discussing data archiving with owners of licensed data.

Data copyright and IPR

Copyright / IPR of the data is generally with the researchers. At times there is joint copyright through use of third party data.

Data management responsibilities within the research team

Most projects allocate data management responsibilities to various researchers within the research team – typically one person per partner institution or one person per work package.

A few projects allocate only one person with data management responsibility for the entire project. For cross-institutional projects, it is not clear how that is manageable.

Four projects have a dedicated data manager, database manager or project manager with overall data management responsibility.

ESRC DMPs

ESRC introduced the requirement for a data management plan to be submitted with every grant application in April 2011. A data management plan should describe:

 

  • an assessment of existing data that could be used for the research
  • information on new data that will be created
  • quality assurance of data
  • back-up and security of data
  • expected difficulties in data sharing, e.g. ethical or legal issues
  • copyright and Intellectual Property Right of data
  • data management responsibilities
  • preparation of data for sharing and archiving

We recently evaluated an anonymous sample of 25 submitted data management plans, evaluating the quality of the information provided for each of those eight topics, by scoring: 1=insufficient; 2=sufficient; 3=excellent; whereby each plan was evaluated twice independently by various staff members of the UK Data Archive’s Research Data Management section.

The average quality score for a DMP was 17, with a minimum score of 9 and a maximum score of 23. Nine DMPS scored below 16 (the score for suffcient information being provided for each topic). Six DMPs scored below 12.

DMPs on average provide good to excellent information on assessing existing data and describing new data to be created (average score of 2.4 and 2.3 resp). DMPs perform poorest on information about copyright and IPR of research data (average score 1.8). 

Scores of 1 (insufficient information provided) were most common for copyright (7 plans), for data management responsibilities (5 plans) and for data preparation (5 plans).

Benefits and evidence

At the JISCMRD Benefits-Evidence Workshop in Bristol on 29-30 November we showed which benefits the Research Data @ Essex project will bring to the University of Essex and to the wider data sharing community and which evidence we are gathering to substantiate those benefits.

The benefits are:

  • Increased data management awareness and skills for researchers at the university
  • Increased data management and data management planning awareness and capacity amongst support staff (REO) at the university
  • Visibility of data assets
  • Pilot data repository developed for the university and the wider community

This powerpoint presentation gives an overview of the evidence for those benefits: RDEssex_Benefits&Evidence

Subject classifications for institutional data catalogues

There are several competing heirarchies which can be used to provide a subject classifcation for research outputs. Of course, we want to enable cross-repository interoperability as fully as possible, so ideally there would be community adoption of a particular, controlled classification.

EPrints comes with a ready-to-install version of the Library of Congress (LoC) subject tree, which is widely used in academic libraries internationally. However we quickly realised, based on both attempts at ingesting data and from researcher feedback, that it does not fit well with ‘real-world’ disciplines and departments in UK Higher Education. Researchers we spoke to found it hard to navigate and lacking the subdisciplines they expected.

Then we recieved a useful tip from Robin Rice at Edinburgh regarding the HESA JACS3 classifcation that DataShare are using. This seems a much more intuitive approach, and we have implemented this

Rde_eprints_jacs3

The question of ‘what to use’ still felt unanswered to us though, so a message to the JISC MRD mailing list followed – resulting in some more useful, but not entirely conclusive feedback! If there was anything consistent point to come out of this, it is that the RCUK classification seems to be in common use. It was a recommendation of the Engage project that this scheme be adopted (good summary of this here). However, there is no well maintained authority for this, or indeed any central web location at all. So not much of a standard then.

We are left still lacking a clear route then. A useful activity (but quite a big job with this level) would be to map between the options so as to ascertain similarity. If RCUK and JACS are similar, at least at a high level, this would make a choice between the two less critical.

Report on research data management at the University of Essex [download]

The report is the result of early work with the four pilot departments, having used a Data Asset Framework based methodology to gather information from research staff (including research directors) on how they look after their data at present, and what their data management requirements and expectations for the future are. These findings were crucial in developing our broader infrastructure strategy, and continue to inform our work here. The report also provides an interesting snapshot of a university just prior to the emergence of research data management infrastructure; a snapshot which we suspect bears similarities to other universities in the UK. Many thanks to those members of the Essex Business School, Department of Biological Sciences, Department of Language and Linguistics, and School of Computer Science and Electronic Engineering who so generously gave up their time to help us gather this information.

Download the report here.