North American DDI (NADDI) Conference 2014

Receive updates for this collection

North American Data Documentation Initiative Conference (NADDI) is an opportunity for those using DDI and those interested in learning more about it to come together and learn from each other. Patterned after the successful European DDI conference (EDDI), NADDI 2014 was a two day conference (April 1-2) with invited and contributed presentations. The conference is of interest to both researchers and data professionals in the social sciences and other disciplines. A full day of training sessions preceded the conference (March 31). One focus of this second year’s conference was on the use of DDI in “Documenting Reproducible Research” by individual research teams through the data lifecycle.

Managing Research Data with DDI-L: Supporting Interoperability Between Multiple Systems

Author: 
Peer reviewed: 
No, item is not peer reviewed.
Date created: 
2014-04-14
Abstract: 

Research data rarely consists of a single physical file containing all the needed metadata and data in a compact unit. Normally there are multiple data files, documentation, code lists, and related items. In short, each study is a mini-collection of material. Traditionally, library, archive, and data systems have managed their information in slightly different ways. Libraries create one record for one physical object. Archives have a more collection centric approach expressed as fonds, series, files, and items. DDI like many metadata specifications incorporates the object’s ““discovery metadata”“ record within its overall structure creating essentially an object that serves as an extended record. But is this the most effective way to manage diverse and highly interconnected collections?The Minnesota Population Center is in the process of organizing our metadata in a way that will allow us to effectively interact with a number of discovery and repository systems around the world. The goal is to use DDI-L as a conduit for linking the contents of our data delivery systems, archive metadata, and related holdings to external systems (e.g. DataONE, da|ra, World Bank) that use a wide variety of standards and formats. This case study reflects our work to date to achieve this goal.

 

Document type: 
Conference presentation

GSIM, CSPA, and Related Activities of the High Level Group

Author: 
Peer reviewed: 
No, item is not peer reviewed.
Date created: 
2014-04-14
Abstract: 

Several recent initiatives related to the use of DDI in official statistics are now in progress, under the auspices of the UNECE’s High Level Group (HLG) on the Modernization of Statistical Production. This presentation will focus on the revised Generic Statistical Information Model (GSIM) 1.1, the related Common Statistical Production Architecture (CSPA) and prototypes, and the mapping work relating these to DDI implementation. Relationships to other standards is also presented, including Statistical Data and Metadata eXchange (SDMX), the “syntax-neutral” expression and validation language being jointly developed by DDI and SDMX, and the related process models (GSBPM and its longitudinal equivalent developed by the DDI Alliance).

Document type: 
Conference presentation

Discover the Power of DDI Metadata

Author: 
Peer reviewed: 
No, item is not peer reviewed.
Date created: 
2014-04-14
Document type: 
Conference presentation

DDI Profiles to Support Software Development as well as Data Exchange and Analysis

Peer reviewed: 
No, item is not peer reviewed.
Date created: 
2014-04-14
Abstract: 

This talk will highlight current developments regarding transnational access to confidential microdata. Examples are access from North America to German labour market data and a proposal for a European Remote Access Network (Eu-RAN) that will bring researchers and research data within the European Research Area closer to each other. A new legal construct called European Research Infrastructure Consortium (ERIC) opens such solutions also for partners outside of Europe. Having better solutions for transnational access is an important step forward. At the same time none of such solutions will be successful without information about the available data. Especially when working with data from another country or even more acute when carrying out comparative research with data from multiple countries, good data documentation is needed. According to that modern transnational data access solutions will only be successful, if circumstances of access, accreditation as well as quality and content of data are documented. Such documentation needs to be easy to understand for the users and easy to implement into software tools. Only if data access and data documentation developments go hand in hand both development lines will be successful and lift transnational research on a higher level. 

 

Document type: 
Conference presentation

DDI — more than just an XML-metadata-standard

Author: 
Peer reviewed: 
No, item is not peer reviewed.
Date created: 
2014-04-14
Abstract: 

The DDI standard (Codebook / Lifecycle) is designed as an XML standard. In the process of “moving forward” the community is working on a model-based representation of the concepts and structures included in the standard. But what is this good for? XML is only one possible solution for the technical representation of the metadata — and there many other possibilities. The presentation gives an overview of technologies that are actively used by members of the community, like storing metadata in relational databases, developing APIs to link software systems, representing the standard as classes in object-oriented-languages and others. A particular focus lays on the JSON format, which has become increasingly important recently in the field of web-development. A second aspect of this presentation is that DDI could be used for more than just metadata — it might also be a good starting point for the storage and exchange of research data, providing an alternative to the common formats of proprietary statistical software packages. The presentation is intended for both a technical and a non-technical audience.

Document type: 
Conference presentation
File(s): 

DDI Discovery: An Overview of Current RDF Vocabularies

Peer reviewed: 
No, item is not peer reviewed.
Date created: 
2014-04-14
Abstract: 

This presentation will provide a review of the work on the two RDF vocabularies for DDI: (1) the DDI-RDF Discovery vocabulary for publishing metadata about datasets into the Web of Linked Data, and (2) XKOS, an RDF vocabulary for describing statistical classifications, which is an extension of the popular SKOS vocabulary

Document type: 
Conference presentation

Dataset Builder Tool: Canadian Research Data Centre Network (CRDCN)

Peer reviewed: 
No, item is not peer reviewed.
Date created: 
2014-04-14
Abstract: 

In 2013, the Statistics Canada’s Research Data Centre (RDC) Metadata Project ended producing a suite of DDI tools, including an open source repurposing tool for researchers called the dataset builder. The tool’s purpose is to create a “repurposing project” for researchers described in XML that sequences various operations. The idea of repurposing is to reshape the master data into a new dataset to be used for the purpose of a research project. The tool comes with several functionalities including a built-in catalog of StatCan DDI coded surveys, a search function, a variable basket supporting variable-level sub-setting, and the ability to generate statistical scripts (SPSS, SAS, STATA) to transform the source data from the master dataset into the research dataset, as well as the creation of a sub-sample codebook and searching across multiple data sets.. This presentation will demonstrate the dataset builder’s functionalities and benefits, and discuss future uses and integration of the tool within the RDC environment.

Document type: 
Conference presentation

The Crumbling Wall: Data Archiving and Reproducibility in Published Science

Peer reviewed: 
No, item is not peer reviewed.
Date created: 
2014-04-14
Abstract: 

Data are the foundation of empirical research, yet all too often the datasets underlying published papers are lost or poorly curated. This is a serious issue, because future researchers are then unable to validate published results, and the data cannot be used to explore new ideas and hypotheses. As part of a study on how the availability of research data is affected by article age, we emailed authors to request the raw data from 516 published articles. These 516 studies were all published between 1991 and 2011, and included a Discriminant Function Analysis (DFA) on morphometric data from animals or plants. We found that broken emails and outdated storage media were the main obstacles to getting the data, such that we only received a total of 101 datasets. However, even when we did receive a data file, there is no guarantee that it matches the exact dataset used in the study itself. To assess how often problems with metadata or data curation affect reproducibility, we tried to recreate the DFA results reported in the paper. Nine papers did not present common types of quantitative results from their DFA and were excluded. For an additional 15 papers we were unable to relate the dataset we received to that used in the original DFA. The reasons ranged from incomprehensible or absent variable labels, the DFA being performed on an unspecified random subset of the data, or incomplete data sets. For another 20 papers, the dataset seemed to correspond to the one in the paper but we could not come close to recreating the authors’ results, which (of course) may stem from an error on either our or the authors’ part. We were able to exactly repeat the results of the DFA analyses from 29 papers, and came very close with an additional 17. Our results illustrate the disconnect between the carefully documented and repeatable science we learned about in school and the grim reality of the current situation – many datasets are lost within a few years, and a significant proportion of the remainder are rendered useless by poor data curation. 

Document type: 
Conference presentation
File(s): 

The Complicated Provenance of American Community Survey Data: How Far Will PROV and DDI Take Us? Tools

Peer reviewed: 
No, item is not peer reviewed.
Date created: 
2014-04-14
Abstract: 

In a series of three papers in 2013, researchers at the Cornell Node of the NSF Census Research Network (http://ncrn.cornell.edu) investigated and proposed solutions for two fundamental yet distinct issues in the curation of quantitative social science data: confidentiality and provenance. We argued that the W3C PROV model, a foundation for semantically-rich, interoperable, and web-compatible provenance metatdata, is especially important in a web environment in which data from distributed sources and of varying integrity can be combined and derived. In this paper we combine and expand upon these two separate threads—confidentiality and provenance—and experiment with the use of PROV and DDI in documenting the complex provenance chain between the highly confidential environment of the U.S. Census Bureau and restricted and public versions of internal census demographic files. In particular, our presentation will report on our effort to: 1) test PROV’s ability to describe meaningful relationships between confidential, restricted and public data at the variable level; 2) develop a user interface for researchers attempting to understand the relationships between distinct versions of confidential, restricted, and public census files. Longer term our work should produce a useful metadata resource for users of public and restricted American Community Survey data.

Document type: 
Conference presentation

Colectica 5 and DDI 3.2: The Next Generation of Metadata Tools

Peer reviewed: 
No, item is not peer reviewed.
Date created: 
2014-04-14
Abstract: 

Colectica is a software suite for managing statistical data and survey descriptions. It is based on the DDI Lifecycle metadata standard and provides tools for designing surveys, documenting data, and describing studies. Colectica can import metadata from existing sources and publish documentation on the Web and in other formats. The new Colectica 5 integrates feedback from national statistics institutes, university archives, and commercial data collection organizations. This session will highlight the new functionality available in Colectica 5, including: DDI 3.2, deep integration with data collection systems, new ways to explore and discover data on the Web, and a translatable user interface.

Document type: 
Conference presentation
File(s):