This document outlines the data quality validation processing for populating the CDM data model and defines measures that each domain follows during validation processing. Data quality validation covers several aspects including data content validation, data integrity and data profiling, with the goal of improving data content quality and integrity of the CDM data model. Research sites can use this guide locally to help improve their data prior to populating the CDM. Implementing this ahead of time causes fewer data check failures during the data curation process.
Data Curation
Pertaining to the management of data throughout its lifecycle, from creation and initial storage, its use in PCOR, to the time when it is archived for posterity
INSIGHT Data Visualization Templates
The INSIGHT Data Visualization Template provides templates of data visualizations that capture the demographic breakdown of a CRN’s patient cohort. A list of data elements available to request is also available. These slides can be reformatted to suit a CRN’s available data elements and highlight strengths of the patient cohort.
GPC Tumor Table Transformation and Linkage
The PCORnet tumor table contains data from hospital tumor registries that are formatted according to standards developed by the North American Association of Certified Cancer Registrars (NAACCR). All hospitals that are accredited by the American College of Surgeons Commission on Cancer employ trained registrars to abstract medical record data according to these specifications. Researchers can use this resource transform their own tumor registries.
Access the GPC Tumor Table Transformation and Linkage
Structured fields for demographic, clinical, and treatment observations are included, and the data are considered to be high quality. GPC tumor table documentation includes specifications for data formats, quality checks, and relationships with other CDM tables. This standardization allows linkages between NAACCR data and the other CDM tables. It also allows queries of the NAACCR data to be quickly deployed across the network.
GPC sites have already transformed their hospital tumor registry data into the PCORnet TUMOR table format. Table specifications can be found here. A sample ETL code and workflow are attached for references.
To assess the quality and quantity of tumor registry data found in the TUMOR table at GPC sites, a quality control script was created to be run against the newly created TUMOR tables. QC reports are being used for quality evaluation.
GPC Reusable Observable Study Environment (GROUSE)
GPC Reusable Observable Study Environment (GROUSE): GROUSE is a Greater Plains Collaborative (GPC) project (as well as name of the data enclave) to obtain health insurance claims from the Center for Medicare and Medicaid Services through the Research Data Assistance Center (ResDAC). GPC currently have 2011-2017 Medicare data and 2011-2012 Medicaid data from 9 states in the GPC.
Data Science Analyst Training
The PEDSnet Data Science Analyst course provides training on the structure and use of the PEDSnet CDM for research and approaches to study-specific data quality assessment.
HERO Data Dictionary
Use this resource to review information, content, format, and structure of the HERO (Healthcare Worker Exposure Response & Outcomes) Research database and the relationship between its elements.
Daquery
The PaTH Clinical Research Network (CRN) Department of Bio-Medical Informatics team developed Daquery a tool used to deploy code, as well as to automate and archive network-wide Quality Assurance queries. The code is publicly available and may be useful to support other CRNs data processes.
PCORnet Common Data Model
The PCORnet Common Data Model, developed by the PCORnet community, standardizes millions of data points from the Network’s clinical information systems into a common format. As a result, users of PCORnet can ask the same question simultaneously to hundreds of disparate systems and receive a clear, reliable answer.
PaTH: How EHR Data is Collected and Protected via a Chocolate-Making Analogy
The PaTH Clinical Research Network (CRN) developed this guide to explain how electronic health record (EHR) data is captured, protected, and utilized for research purposes via a chocolate-making analogy.
PaTH to Health: Diabetes, Chocolate Making & Data Extraction Video
This video on electronic health data utilizes the metaphor of making chocolate to clearly lay out how electronic health records can be used to anonymize data. It is a useful tool for clearly explaining EHRs and the privacy inherent in building a research network.