Saez-Rodriguez Group

Systems Biomedicine

Deriving mechanistic insights from multi-omics data

Recent technologies allow us to obtain measurements of thousands of transcripts, proteins, or metabolites per sample. There are typically much fewer  samples than measured features. This results in a very limited statistical power for analyses such as linking molecular features to phenotypes or to the efficacy of a therapy. A strategy to address this issue is to summarise these data sets via a  smaller number of features. We develop approaches to extract such features in a way that they can be directly mapped to biochemically distinct elements (see Fig.). This is a powerful strategy to 1) reduce the number of variables and 2) yield interpretable biological features that can be experimentally validated. 

Genomics. Genomic data such as somatic mutations in cancer can be condensed into information about which pathways are mutated in a statistically significant way. We developed the tool SLAPEnrich [1] to perform such analyses.

Transcriptomics.  In the case of gene expression, a number of functional features can be extracted. A common approach is to map gene expression on the corresponding genes of given pathways. However, the correlation between gene expression and the function of the corresponding protein is often poor. An alternative strategy is to consider genes known to be affected by perturbation of pathways (so-called pathway footprints [2]. For this purpose we developed the tool PROGENy [3].

Additionally, it is possible to use gene expression data to estimate the activity status of transcription factor (TF) proteins by looking at the mRNA levels of its direct targets (TF footprint).  DoRothEA is a resource containing TF-target interactions integrated from different types of evidence. Using those interactions with statistical enrichment analysis allows the estimation of TF activities from gene expression data [4] We have shown that the footprints of signaling pathways and TFs on gene expression are evolutionarily conserved between humans and the widely used model organism Mus musculus [5]. This opens up the possibility to functionally characterize mouse besides human data using PROGENy and DoRothEA. Both PROGENy and DoRothEA are available as Bioconductor packages. 

We have shown that our bulk and footprint based tools PROGENy and DoRothEA can be applied on scRNA-seq data partially outperforming dedicated single-cell tools [6].  You can watch here a summary of our activities in this area.

Once their activities have been estimated, transcription factors and signaling pathways signatures can be linked to upstream perturbations (drugs, ligands, mutations, etc.) using network-based approaches such as so-called causal reasoning methods. Here, we use the signed, directed interactions in Omnipath as a prior knowledge network and we make use of an integer linear programming (ILP) mathematical formulation to infer regulatory signalling network topology from gene expression data. The pipeline also integrates TF and pathway scores from DoRothEA and PROGENy for network contextualization and we compiled the whole framework as a Bioconductor package called CARNIVAL [7].

 Approaches to extract functional features from ‘omics’ data: pathways enriched in mutations (red); transcription factors differentially activated (brown), and pathway activities as determined by their footprints on gene expression (pink).

Phosphoproteomics and metabolomics. In a similar manner to the transcription factors, the activity of kinases can be inferred from the levels of phosphorylation of the proteins they target [8], with e.g. the tool KinAct. Anologously, one can estimate activity of metabolic enzymes from metabolomics, and we are preparing Ocean, a method to do this.

Multi-omic causal integration. By taking advantage of the conceptual similarities between footprint approaches, we have expanded the use of CARNIVAL to more diverse types of omic data, within the tool COSMOS [9].  Here, we can combine phosphoproteomic, transcriptomic and metabolomics data to find causal paths linking signaling, gene-regulatory, and metabolic processes.

Multi-omic data can be integrated within the tool COSMOS [9].


  1. Iorio F, Garcia-Alonso L, Brammeld JS, Martincorena I, Wille DR, McDermott U, et al. Pathway-based dissection of the genomic heterogeneity of cancer hallmarks’ acquisition with SLAPenrich. Scientific Reports. 2018. doi:10.1038/s41598-018-25076-6
  2. Dugourd A, Saez-Rodriguez J. Footprint-based functional analysis of multiomic data. Current Opinion in Systems Biology. 2019. doi:10.1016/j.coisb.2019.04.002
  3. Schubert M, Klinger B, Klünemann M, Sieber A, Uhlitz F, Sauer S, et al. Perturbation-response genes reveal signaling footprints in cancer gene expression. Nat Commun. 2018;9: 20. doi:10.1038/s41467-017-02391-6
  4. Garcia-Alonso L, Holland CH, Ibrahim MM, Turei D, Saez-Rodriguez J. Benchmark and integration of resources for the estimation of human transcription factor activities. Genome Res. 2019;29: 1363–1375. doi:10.1101/gr.240663.118
  5. Holland CH, Szalai B, Saez-Rodriguez J. Transfer of regulatory knowledge from human to mouse for functional genomic analysis: Supplementary Document. 2019. doi:10.1101/532739
  6. Holland CH, Tanevski J, Perales-Patón J, Gleixner J, Kumar MP, Mereu E, et al. Robustness and applicability of transcription factor and pathway analysis tools on single-cell RNA-seq data. Genome Biol. 2020;21: 36. doi:10.1186/s13059-020-1949-z
  7. Liu A, Trairatphisan P, Gjerga E, Didangelos A, Barratt J, Saez-Rodriguez J. From expression footprints to causal pathways: contextualizing large signaling networks with CARNIVAL. 2019. doi:10.1101/541888
  8. Wirbel J, Cutillas P, Saez-Rodriguez J. Phosphoproteomics-Based Profiling of Kinase Activities in Cancer Cells. Methods in Molecular Biology. 2018. pp. 103–132. doi:10.1007/978-1-4939-7493-1_6
  9. Dugourd A, Kuppe C, Sciacovelli M, Gjerga E, Gabor A, Emdal KB, et al. Causal integration of multi‐omics data with prior knowledge to generate mechanistic hypotheses. Molecular Systems Biology. 2021. doi:10.15252/msb.20209730