Understanding the biology of the human body and how to manipulate it creates a foundation for research that can help society fight major diseases, like cancer. How to identify the complex relationships that define the healthy or cancerous cell, for example, poses a major computational problem. Identifying data dependencies within genomic data sets, as well as across genomic data sets, are some of the data driven challenges considered in this effort.
Use of Data in Cancer Research
Javier Arsuaga
Cancer is characterized by copy number changes in the form of amplifications and deletions of the genome. Several experimental methods are used to detect copy number changes but their identification remains challenging due to experimental noise and the presence of copy number changes that do not seem to have biological significance. In topological analysis of aCGH we combine methods from statistical genetics and topology to identify copy number changes associated to specific tumor types. In this process, each patient profile is transformed into a point cloud from which a sequence of simplicial complexes (“filtration”) can be obtained. While the figure on the left may appear uninformative, the figure on the right shows two important clusters (one in red around the origin and one large cluster colored in green detected through the filtration). The green cluster represents an amplification in the selected genomic region. (Data from Horlings et al. 2010).
Fluorescence Lifetime Imaging (FLIm)
Laura Marcu, Ph.D.
Fluorescence Lifetime Imaging (FLIm) is a real-time, fiber-based imaging modality capable of detecting variations in the biochemical composition of tissue (e.g. collagen, elastin, metabolic co-factors) and discriminating between tissue conditions (e.g. healthy, cancer, necrosis, atherosclerosis). The data captured by this approach includes multiple spectroscopic FLIm parameters (average fluorescence lifetime, intensity ratio, Laguerre expansion coefficients) as well as the fluorescence decay waveforms captured by the FLIm instrument. Various analysis approaches have been employed to understand and utilize this data for applications including cancer margin assessment, tissue engineering, and intravascular diagnostics.
To relate FLIm data with specific tissue conditions or pathologies require a number of methods such as:
- Univariate statistics (parametric and non-parametric) are calculated for spectroscopic FLIm parameters and used identify statistically significant changes between tissue conditions for in vivo and ex vivo specimens.
- Linear regression is used to correlate FLIm parameters with biochemical composition (e.g. collagen content, cell number) for tissue engineering applications (e.g. assessment of tissue maturation).
- Multivariate analysis techniques (e.g. multivariate regression, support vector machines) have also been applied to tissue discrimination (e.g. distinguishing normal from the atherosclerotic aorta, breast cancer detection in ex vivo specimens) and correlative studies.
- Deep machine learning (i.e. CNN, SVM) has been used to perform feature extraction and classification using fluorescence decay waveforms in order to classify tissue and assist in intraoperative cancer margin assessment. Deep learning methods are employed to extract additional discerning features that may not be captured by a discrete set of FLIm parameters.
Multi-center clinical studies using the FLIM technology are currently being planned. This will require both large database management [optical data, other medical imaging data (MRI, CT, etc), histopathology, other relevant patient-related clinical data, etc] and deep-learning and AI methods to make sense of this data.