Committing to Data Quality (Keynote address)​

Author: 
Peer reviewed: 
No, item is not peer reviewed.
Date created: 
2014-04-14
Abstract: 

There is evidence of an increase across domains in efforts to make data openly available for reuse and reproducibility. We also see an increase in the number of institutional and domain specific repositories that provide storage and access to research data. Much of this activity is ​in​ response to requirements by funders, governments, and institutions, but is also a reflection of changing cultural norms about open science and scholarly communication. (Peer, 2014) However, openness in itself has little value unless it is "intelligent openness" (Royal Society, 2012) This means that published data should be accessible (can they be readily located?), intelligible (can they be understood?), assessable (can their source and reliability be evaluated?) and reusable (do the data have all the associated information required for reuse?) Given that a high proportion of data are being deposited without a curatorial review to check data for reuse and reproducitility, we anticipate that there will be data loss due to problems that could be resolved by committing to a data review process. That process includes examining data, documentation, and code to be sure that they meet the requirements of "independent understandability" (OAIS, 2012) for informed reuse. Given that much of the "long tail" of data won’t make it into archives that provide curation and quality review, it will fall to researchers to adequately prepare data for informed reuse as part of their standard data management strategies, ideally as part of the research workflow. We propose that if points in the lifecycle were marked as explicit moments for quality review, awareness and commitment to data quality would become more salient and best practices followed more closely and with more zeal. How does the DDI fit into this challenge? In the social sciences, there is a history of documentation that provides what is required for data to be usable and understandable, and the DDI was built upon that history. Organizations that implement the DDI are able to provide highly usable data that meet the quality criteria discussed in this talk. The DDI Lifecycle model supports the production and management of high quality data documentation and could play a significant role in: providing tools to support best practices by researchers, collecting more of the source materials produced during the research workflow that can improve understandability and reuse, and developing tools for data quality review by curators & publishers. Despite many challenges, we believe that stewardship of data in the context of “really reproducible research” demands increased attention to the challenges of independent understanding of data for informed reuse. Improving the quality of data is an investment in future data sharing, and improving the quality of the data is an obligation of any entity that assumes responsibility over the data.

References:

Peer, L. (2014) Mind the Gap: Data they share may not be data you can use. ISPS Blog. http://isps.yale.edu/news/blog/2014/03/mind-the-gap#.U1PeslwQebA Royal Society. (2012) Science as an open enterprise: open data for open science. https://royalsociety.org/policy/projects/science-public-enterprise/Report/ CCSDS. (2012) Reference Model of an Open Archival Information System (OAIS). http://public.ccsds.org/publications/archive/650x0m2.pdf Data Documentation Initiative. http://www.ddialliance.org Peer, L., Green, A. & Stephenson, E. (2014). Committing to Data Quality Review. IJDC, forthcoming. Preprint: http://isps.yale.edu/sites/default/files/files/CommitingToDataQualityReview_idcc14-PrePrint.pdf

Description: 

Ann Green, Digital Lifecycle Research & Consulting

Language: 
English
Document type: 
Conference presentation
Rights: 
Copyright remains with the author.
File(s): 
Statistics: