Data is the backbone of PCORnet, and ensuring the Network represents the right data depth, breadth, and quality standards is essential to its success. With access to data exploding in recent years, harnessing and preparing that information for clinical research is complex. PCORnet addresses this challenge, offering researchers the ability to query data that is comprehensive, standardized, and secure.
PCORnet is a Distributed Research Network (DRN), which means that each of PCORnet’s Partner Networks securely collects and stores their data in data centers within their own institutions. This allows for secure analysis of separate data resources without requiring the pooling of data or the use of a National, central data warehouse. Because information is often recorded in different ways across institutions within Partner Networks, PCORnet’s Common Data Model (CDM) provides an opportunity, in particular for the Clinical Data Research Networks (CDRNs), to capture data, such as their point-of-care EMRs (Electronic Medical Records) and store it in the same way so that it is standardized for ease of use when conducting research.
Some data sources, such as demographics and diagnosis data, are available from several CDRNs, standardized in the CDM and ready for use in research. Other data sources, such as claims data, may be captured by some CDRNs, however, there will be variability in the extent of standardization and readiness of the data for use in research.
To ensure the data housed in the CDM is fit for use across a broad research portfolio, PCORnet’s Coordinating Center uses a process called data curation.
Currently, PCORnet represents more than 100 health institutions across the country and offers the ability to query data on approximately 128 million Americans. Data from all of these patients are potentially available for observational research, and data from 65 million of these patients are potentially available for clinical trials. Each patient PCORnet derives data from meets the criteria of having had a medical encounter within the past five years for observational studies and within the past one year for clinical trials*.
While this basic demographic information is important for a general understanding of PCORnet’s scope, PCORnet recognizes that researchers are often seeking more detail for pre-research and study feasibility questions. To provide deeper insight into the Network’s potential pools of patients beyond simple demographic information, PCORnet has queried the data within the Network and offers a Conditions of Interest Summary on up to 46 million individuals (stratified by sex, race, and age*). Each of these individuals received care at a PCORnet Partner Network between Oct 1st 2014 and Sept 30th 2015 and experienced a medical encounter in which they were diagnosed with one of 10 conditions of interest.
When performing research within a DRN like PCORnet, the data stay local. Since it is generally not possible for study investigators to examine the patient-level data at each Partner Network to look for anomalies, it is crucial that the underlying data in each of PCORnet’s data centers within the Partner Networks be of high quality. To ensure this foundational level of data quality PCORnet’s Coordinating Center uses an iterative, stepwise data curation process, which involves a set of analytic queries and Data Quality Checks combined with ongoing Partner Network communication and maintenance of detailed CDM Implementation Guidance.
- Data Curation Queries are used to assess the quality, completeness, and characteristics of the data. Analysts at the Coordinating Center carefully examine the data curation results to identify common themes and opportunities for improvement.
- All Partner Networks participate in Data Curation Discussion Forums to review data curation results, discuss identified common themes, and share best practices.
- PCORnet maintains Implementation Guidance to mitigate the variability in how the Partner Networks map their source data into the CDM. With each cycle, findings from data curation help inform the development and refinement of the Implementation Guidance and the Data Quality Checks. See our latest Implementation Guidance, Data Quality Checks, and CDM specifications.
The iterative nature of the data curation process ultimately allows PCORnet to gradually increase the foundational level of data quality, reduce variability across the Network, and increase the transparency and reproducibility of analyses within PCORnet.