DeeDeeExperiment: Building an infrastructure for integrating and managing omics data analysis results in R/Bioconductor
Episode

DeeDeeExperiment: Building an infrastructure for integrating and managing omics data analysis results in R/Bioconductor

Dec 5, 20257:26
q-bio.GN
No ratings yet

Abstract

Summary: Modern omics experiments now involve multiple conditions and complex designs, producing an increasingly large set of differential expression and functional enrichment analysis results. However, no standardized data structure exists to store and contextualize these results together with their metadata, leaving researchers with an unmanageable and potentially non-reproducible collection of results that are difficult to navigate and/or share. Here we introduce DeeDeeExperiment, a new S4 class for managing and storing omics data analysis results, implemented within the Bioconductor ecosystem, which promotes interoperability, reproducibility and good documentation. This class extends the widely used SingleCellExperiment object by introducing dedicated slots for Differential Expression (DEA) and Functional Enrichment Analysis (FEA) results, allowing users to organize, store, and retrieve information on multiple contrasts and associated metadata within a single data object, ultimately streamlining the management and interpretation of many omics datasets. Availability and implementation: DeeDeeExperiment is available on Bioconductor under the MIT license (https://bioconductor.org/packages/DeeDeeExperiment), with its development version also available on Github (https://github.com/imbeimainz/DeeDeeExperiment).

Summary

The paper addresses the challenge of managing and integrating the increasingly large and heterogeneous results generated by modern omics experiments, particularly differential expression analysis (DEA) and functional enrichment analysis (FEA). The authors identify a gap in standardized data structures for storing, contextualizing, and sharing these results alongside their metadata, leading to potential reproducibility issues and difficulties in navigating complex datasets. To address this, they introduce DeeDeeExperiment, a new S4 class implemented within the Bioconductor ecosystem in R. This class extends the widely used SingleCellExperiment object by incorporating dedicated slots for DEA and FEA results, enabling users to organize, store, and retrieve information across multiple contrasts within a single data object. DeeDeeExperiment aims to streamline the management and interpretation of omics data by providing a unified container for DEA and FEA results. The class leverages Bioconductor's principles of interoperability and reproducibility, ensuring compatibility with existing downstream analysis tools. By storing original DEA results objects and feature-level statistics, DeeDeeExperiment facilitates efficient retrieval, exploration, and contextualization of omics data analysis results. Furthermore, the framework includes methods for adding, removing, and summarizing stored analysis results, along with helper functions for integrating results from various analysis tools like limma and muscat. The development of DeeDeeExperiment addresses a critical need in the field of omics data analysis by providing a standardized and reproducible framework for managing and sharing complex experimental results.

Key Insights

  • DeeDeeExperiment introduces a novel S4 class that extends the SingleCellExperiment object, adding dedicated slots for Differential Expression Analysis (DEA) and Functional Enrichment Analysis (FEA) results.
  • The class provides a standardized data structure for organizing, linking, and contextualizing multiple DEA and FEA results together with their associated metadata, addressing a critical gap in the field.
  • DeeDeeExperiment stores the full original DEA results objects (e.g., DESeq2, edgeR, limma objects) in the metadata slot, while extracting and embedding feature-level statistics (logFC, p-value, adjusted p-value) into the rowData slot. This avoids data duplication and facilitates efficient retrieval.
  • The `dea` and `fea` slots are implemented as a set of named contrasts, each carrying basic metadata such as the package and version used to generate the results, enhancing reproducibility.
  • The `addScenarioInfo()` method allows users to store contextual information related to specific DEA results, further improving reproducibility and enabling machine-assisted interpretation with LLMs.
  • DeeDeeExperiment maintains compatibility with the broad ecosystem of Bioconductor tools, allowing users to readily visualize and explore the data with tools like scater and iSEE.
  • The `summary()` method provides a quick contrast-level overview, such as the number of up/down-regulated genes or enriched terms, facilitating efficient data exploration.

Practical Implications

  • DeeDeeExperiment provides a practical solution for researchers and bioinformaticians who struggle to manage and integrate the large volume of results generated by omics experiments, particularly in single-cell RNA-seq and other complex experimental designs.
  • The standardized data structure promotes reproducibility and facilitates collaboration by providing a unified container for DEA and FEA results, along with their associated metadata.
  • Practitioners can use DeeDeeExperiment to organize, retrieve, explore, contextualize, and interpret their analysis results more efficiently across multiple contrasts, enabling more nuanced and quantitative approaches beyond simple overlap strategies.
  • The compatibility with existing Bioconductor tools allows users to seamlessly integrate DeeDeeExperiment into their existing workflows for downstream analysis and visualization.
  • Future research directions include capturing additional provenance details such as sessions and environments, ensuring an even higher standard of reproducibility, and integrating DeeDeeExperiment objects with emerging AI-driven tools for machine-assisted interpretation.

Links & Resources

Authors