Progressing to Integrated Genomics and Molecular CDS

June 2019 - Vol. 8 No. 6 - Page #18

The bioinformatic analysis lab serving the Huntsman Cancer Institute and the University of Utah provides services ranging from research focused single cell sequencing and analysis of research samples and protocol development, to clinical research and assisting the genetic counseling department. In technical terms, our research arm is ahead of our ever-increasing clinical needs, but together the combination facilitates both the growth of clinical options and improvement of care, and in this context matches the convergence of research and clinical genomics seen globally.1

As demonstrated by the lengthy and continually unfolding history of phenotypic data, genomic data also is poised to bring even greater improvements to health care through precision treatments and increased patient stratification. However, integration of more than summary genomic results into the patient record and including coordinated linkage of patient phenotype to raw data are key to unlocking the potential for multi-omics data.

From a bioinformatics perspective, the value of tying sequencing data to patient phenotypic data is mutually beneficial to both clinical decision making and research. That said, incorporating large-scale genomic data together with the clinical record is such a large undertaking that it will require progressive steps; nonetheless, these steps will benefit clinical and research bioinformatics, as well as patient well-being and institutional economics and research. The coupling of molecular data with the patient record enables the practice of precision medicine and the research that uncovers new advances.

Value of Institutional Bioinformatics

The democratization of next generation sequencing (NGS) and expanding precision medicine drives the return of more NGS results while also improving the cost-effectiveness of NGS testing.2 These results have, for all intents and purposes, saturated summary-based reports to primarily a short list of variants that are actionable or prognostic, and a list of some potential variants of uncertain significance. The College of American Pathologists (CAP) recognizes such NGS tests are composed of wet bench processes that include sample handling to library prep and sequencing, as well as dry bench processes, such as file generation, alignment, annotation, and filtering (see FIGURE 1).

From a bioinformatic perspective, to fully utilize the wealth of NGS data together with the institution’s patient populational and phenotypic data, requires a hands-on approach to the dry bench portion of NGS tests to process, analyze, and report their patient’s genomic data to fully extract the value. In other words, the current state of the art of presenting NGS based results in PDF summaries is the tip of the iceberg. In the least, the raw sequencing data is in fact evergreen and retains value for that patient, similar patients, and their families despite the passage of time. Those NGS data, as a digital CLIA-based (Clinical Lab Improvement Act/Amendment) result, are available for reprocessing and review of reportable genetic results, and such reprocessing alone can return as many as 24.9% reclassified variants.3

While the growth in variant classification is one matter, other bioinformatic approaches could also return more results, such as microsatellite instability and other mutational signatures, as research establishes firm clinical relevance. Furthermore, with panels continuing to expand and with many providers turning to partially masked, whole exome-based results, the ability to return both somatic, and germline-based variants also will expand. In this sense, patient raw data should be archived with their medical record, as many molecular providers only call a limited number of ACMG (American College of Medical Genetics) recommendations for germline mutations, if at all, given the higher computational requirements for germline variants.

Moreover, with the masking of many NGS results, subsequent unmasking and bioinformatic analysis also could serve the patient when clinically relevant, while still reducing the upfront review of variants until needed.

Growth of Referral Lab Services

With this increase in molecular results, specialized lab service providers have populated the various process steps in the return of NGS-based results. For some time, lab directors could utilize referral labs for any step in the NGS process. CAP supports the use of referral labs under separate CLIA licenses to collectively build a lab developed test (LDT) from sample to result. This is in contrast to an LDT model where a lab has all the processes in place under one license (see FIGURE 2). Furthermore, regardless of whether an LDT is under one or multiple CLIA licenses, CAP checklists and proficiency testing should be distributed into separate wet and dry bench processes. Such specialization is the business model for clinical genomics, precision medicine, and genetic analysis companies, and other wet and dry bench process providers.

Process Validation

In regard to NGS analytical wet bench process validation, the CAP Molecular Pathology checklist states that “the output of the NGS analytical wet bench process is a collection of sequence data that requires additional bioinformatics processing and analysis to determine whether the sequence is of sufficient quality and quantity for the intended test. To determine this, and to ensure acceptable beginning-to-end test performance, validation of the NGS analytical wet bench process must be integrated with the bioinformatics process validation for the intended test (MOL.36115).” This is in addition to verbiage under CAP checklist headings MOL.35840, 35845, and 35865 that further support divisible wet bench, bioinformatics, and interpretation aspects of NGS testing.

Value of Data Analysis

From an institutional bioinformatics perspective, analyzing and reporting on raw sequencing files carries several advantages in addition to the value of the clinical information itself. Both the digital nature, and near universality of sequencing files make them a valuable resource for institutions and patients. Furthermore, through cloud-based data retrieval, processing, analyzing, reporting, and archiving raw sequencing results carries immediate and future benefit to patients and the institution.

In the immediate term, local processing would allow electronic signatures and incorporation or linkage of the results into the electronic medical record (EMR). In turn, that locally processed data would be available to clinical decision support technologies and molecular tumor boards. These immediate benefits could allow for faster reporting of results and improved quality of care, as well as information tailoring for the local EMR and consistent reporting of universal genomic coordinates. Lastly, the infrastructure for local data processing no longer requires large local servers, as virtual machine environments are readily available on Google Cloud and Amazon Web Services negating much of the local server requirements necessary to perform analyses and archive data in compliant, protected environments.

Looking to the future, the processing and archiving of clinical sequencing becomes increasingly important as the standard of care shifts to larger panels indicated by labs replacing smaller panels with whole exome sequencing (WES). Bioinformatically, many results portions from larger panels or WES could be partially masked depending on the indication. Similarly, those same results would also be available for future analysis from unmasking (or reprocessing with improvements) reference sequence releases and variant reclassifications, but also could include mutational signatures, and support treatment options through modeling of tumor micro-environments.

Such masking and unmasking in a clinically relevant manner facilitates savings in bioinformatic and variant interpretation steps until clinically warranted. This also enables research into germline disease linkage or variant classification, while still allowing for return of results and incidental findings (if performed to CAP and CLIA specifications). As mentioned, institutional control of the raw data would allow for the re-processing of sequence files, should the need arise.

Raw Sequence Discovery Engines

Beyond the immediate advantages of clinical integration of genomic data, the consented, de-identified results coupled with clinical data can provide a wealth of information for cohort analysis. This information is crucial to future advances, identifying and reclassifying variants, etc, and is highly coveted by industry discovery engines. Building large genomic or multi-omic discovery engines to scale is quickly moving past institutionally developed application suites to platforms from commercial service providers. These products, or internally developed software, that incorporate bioinformatic workflows for discovery and hypothesis testing are further democratizing access to bioinformatic workflows. The addition of highly curated public data, offered by commercial platforms, introduces both higher order cluster analysis and leveraging of rare mutations that enhance these data processing endeavors. Furthermore, as these platforms expand into the germline space, the research opportunity will expand, as will improvements to patient care.


As health care institutions and clinicians move past the use of PDF-based genomic reports to the coupling of clinical and genomic data into cohort analysis, doing so requires a progressive step in the inherent infrastructure, workflows, and talent required to integrate genomic results effectively in the EMR. That utility will no doubt be crucial for implementations of multi-omic clinical decision support tools, molecular tumor boards, and support for the increasing need for genetic counselors. In essence, institutions need to commit to investing in clinical bioinformatics and coalesce all the necessary components to extract value from the NGS testing they are already performing and intend to expand.


  1. Birney E. The Convergence of Research and Clinical Genomics. Am J of Hum Genet. 2019; 104(5):781-783.
  2. Schofield D, Alam K, Douglas L, et al. Cost-effectiveness of massively parallel sequencing for diagnosis of paediatric muscle diseases. Nature; Genomic Medicine. Vol. 2; March 3, 2017.
  3. Mersch J, Brown N, Pirzadeh-Miller S, et al. Prevalence of Variant Reclassification Following Hereditary Cancer Genetic Testing. JAMA. 2018;320(12):1266-1274.

Aaron Atkinson, PhD, is a clinical genomicist and research scientist in high-throughput sequencing and bioinformatic analysis at the Huntsman Cancer Institute and University of Utah Health.


Like what you've read? Please log in or create a free account to enjoy more of what has to offer.

Current Issue

Enter our Sweepstakes now for your chance to win the following prizes:

Just answer the following quick question for your chance to win:

To continue, you must either login or register: