Researchers use machine learning to unlock the genomic code in clinical cancer samples

A new paper from University of Helsinki, published today in Nature Communications, suggests a method for accurately analyzing genomics data in cancer archival biopsies. This tool uses machine learning methods to correct damaged DNA and unveil the true mutation processes in tumor samples. This helps to unlock tremendous medicine values in millions of archival cancer samples.

Molecular-based diagnosis helps to match the right patient with the right cancer treatment. Researchers took particular interest in DNA profiling in clinical cancer samples.

“This invaluable source is currently not being used for molecular diagnosis due to the poor DNA quality. Formalin causes severe damage to DNAs, which therefore place an inevitable challenge to analyze cancer genomes in preserved tissues,” says lead author Qingli Guo from University of Helsinki.

Analyzing mutation processes in cancer genomes can help early cancer detection, to accurately diagnose cancer, and reveal why some cancers become resistant to treatment. The new method can dramatically accelerate the development of clinical applications that can directly impact future cancer patient care.

The new method predicted more than 90% of developing cancer processes

Lead author Qingli Guo works in close collaboration with scientists from The Institute of Cancer Research (ICR), London, and Queen Mary University of London, developed machine learning methods, named FFPEsig, to unravel exactly how formalin mutates DNA.

“Our results show that normally nearly half of the cancer processes will be missed without noise correction. However, using FFPEsig, more than 90% of them were accurately predicted,” says Qingli.

Cancer evolves gradually. Profiling mutational processes in longitudinal samples helps to identify clinical informative predictors and make diagnosis of each tumor stage.

“Our finding enables the characterization of clinically relevant signatures from the preserved tumors biopsies stored at room temperatures for decades. With a deep understanding of how formalin impacts cancer genome, our study opens a huge opportunity to transform the developed signature detection assays using the large cost-effective archival samples,” say the researchers.

The researchers pointed out the method currently does not completely remove artifacts that appeared in FFPE samples showing batch effects, and how well the tool performs varies by cancer type, so care must be taken to interpret any findings. They are also interested in further applying their methods to a much broader spectrum of archival samples in the future.