Recent technologies allow us to obtain measurements of thousands of transcripts, proteins or metabolites per sample. There are typically much fewer number of samples than of measured features. This results in a very limited statistical power for analyses such as linking the molecular traits to a phenotype, such as the efficacy of a therapy.
A strategy to address this issue is to integrate and compress these data sets into a smaller number of features. In our group we develop approaches to extract such features in a way that they can be directly mapped to biochemically distinct elements (Fig. 1). In this way not only we achieve a reduction of variables, but also the features are directly interpretable and often lend themselves for direct experimental validation.
Genomic data such as somatic mutations in cancer can be condensed into information about which pathways are mutated in a statistically significant way. We developed the tool SLAPEnrich (Iorio et al, Biorxiv) to perform such analyses.
In the case of gene expression, a number of functional features can be extracted. A common approach is to map gene expression on the corresponding genes of given pathways, but our analyses suggest that it is more adequate to rather consider gene expression as the downstream effect of signaling pathways, by looking at the level of genes known to be affected by given pathways (Schubert et al., Biorxiv).
Additionally, it is possible to use gene expression data to estimate the activity status of transcription factor proteins by looking at the mRNA levels of its directly targeted genes, and use this information to improve cancer biomarkers (Garcia-Alonso et al., biorxiv).
Once their activities have been estimated, transcription factors can be linked to upstream perturbations (ligands, mutations, etc.) using network-based approaches such as so-called causal reasoning methods. We have developed such an approach implemented as an integer linear programming (ILP) problem (Melas et al., Int. Biology, 2015).
In a similar manner to the transcription factors, activity of kinases can be inferred from the levels of phosphorylation of the proteins they target (Wirbel et al. biorxiv; Hernandez-Armenta C et al., Bioinformatics, 2017).
Figure 1. Selected approaches to extract functional features from ‘omics’ data: pathways enriched in mutations (red ); transcription factors differentially activated (brown), and pathway activities as determined by their footprints on gene expression (pink).