Common Data Model (CDM) Data Quality Validation

This document outlines the data quality validation processing for populating the CDM data model and defines measures that each domain follows during validation processing. Data quality validation covers several aspects including data content validation, data integrity and data profiling, with the goal of improving data content quality and integrity of the CDM data model. Research sites can use this guide locally to help improve their data prior to populating the CDM. Implementing this ahead of time causes fewer data check failures during the data curation process.

Access the CDM Data Quality Validation guide here.

INSIGHT Data Visualization Templates

The INSIGHT Data Visualization Template provides templates of data visualizations that capture the demographic breakdown of a CRN’s patient cohort. A list of data elements available to request is also available. These slides can be reformatted to suit a CRN’s available data elements and highlight strengths of the patient cohort.

Access the Data Visualization Template here.

GPC Tumor Table Transformation and Linkage

The PCORnet tumor table contains data from hospital tumor registries that are formatted according to standards developed by the North American Association of Certified Cancer Registrars (NAACCR). All hospitals that are accredited by the American College of Surgeons Commission on Cancer employ trained registrars to abstract medical record data according to these specifications. Researchers can use this resource transform their own tumor registries.

Access the GPC Tumor Table Transformation and Linkage

Structured fields for demographic, clinical, and treatment observations are included, and the data are considered to be high quality. GPC tumor table documentation includes specifications for data formats, quality checks, and relationships with other CDM tables. This standardization allows linkages between NAACCR data and the other CDM tables. It also allows queries of the NAACCR data to be quickly deployed across the network.

GPC sites have already transformed their hospital tumor registry data into the PCORnet TUMOR table format. Table specifications can be found here. A sample ETL code and workflow are attached for references.

To assess the quality and quantity of tumor registry data found in the TUMOR table at GPC sites, a quality control script was created to be run against the newly created TUMOR tables. QC reports are being used for quality evaluation.

GPC Reusable Observable Study Environment (GROUSE)

GPC Reusable Observable Study Environment (GROUSE): GROUSE is a Greater Plains Collaborative (GPC) project (as well as name of the data enclave) to obtain health insurance claims from the Center for Medicare and Medicaid Services through the Research Data Assistance Center (ResDAC). GPC currently have 2011-2017 Medicare data and 2011-2012 Medicaid data from 9 states in the GPC.

Access GPC’s GROUSE here.

Data Science Analyst Training

The PEDSnet Data Science Analyst course provides training on the structure and use of the PEDSnet CDM for research and approaches to study-specific data quality assessment.

Access the Data Science Analyst course.

HERO Data Dictionary

Use this resource to review information, content, format, and structure of the HERO (Healthcare Worker Exposure Response & Outcomes) Research database and the relationship between its elements.

Access the resource here.

Daquery

The PaTH Clinical Research Network (CRN) Department of Bio-Medical Informatics team developed Daquery a tool used to deploy code, as well as to automate and archive network-wide Quality Assurance queries. The code is publicly available and may be useful to support other CRNs data processes.

Access the Daquery resource here.

PCORnet Common Data Model

The PCORnet Common Data Model, developed by the PCORnet community, standardizes millions of data points from the Network’s clinical information systems into a common format. As a result, users of PCORnet can ask the same question simultaneously to hundreds of disparate systems and receive a clear, reliable answer.

Access the PCORnet CDM.

PaTH: How EHR Data is Collected and Protected via a Chocolate-Making Analogy

The PaTH Clinical Research Network (CRN) developed this guide to explain how electronic health record (EHR) data is captured, protected, and utilized for research purposes via a chocolate-making analogy.

Access the resource here.

PaTH to Health: Diabetes, Chocolate Making & Data Extraction Video

This video on electronic health data utilizes the metaphor of making chocolate to clearly lay out how electronic health records can be used to anonymize data. It is a useful tool for clearly explaining EHRs and the privacy inherent in building a research network.

Access the resource here.