Mendelian Estimation in Distributed Standardized Analytics
Federated two-sample Mendelian Randomization on the OMOP Common Data Model.
PACKAGE UNDER CONSTRUCTION: DO NOT USE
Overview
Medusa implements two-sample Mendelian Randomization (MR) natively within the OHDSI ecosystem. It enables causal inference across distributed health networks without requiring individual-level data to leave any site.
The core innovation is one-shot federated pooling via profile likelihood aggregation: each site computes a log-likelihood profile and shares only site-level summary files centered on that numeric vector. The coordinator sums profiles across sites to obtain a pooled estimate with no iterative communication protocol.
Architecture
┌──────────────────────────────────────────────────────────┐
│ COORDINATOR NODE │
│ │
│ getMRInstruments() ──────► Instrument Table │
│ │ (from OpenGWAS) │
│ │ distribute │
│ ▼ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Site A │ │ Site B │ │ Site C │ OMOP CDM │
│ │─────────│ │─────────│ │─────────│ Sites │
│ │buildMR │ │buildMR │ │buildMR │ │
│ │Cohort() │ │Cohort() │ │Cohort() │ │
│ │ ▼ │ │ ▼ │ │ ▼ │ │
│ │fitOut │ │fitOut │ │fitOut │ │
│ │come │ │come │ │come │ │
│ │Model() │ │Model() │ │Model() │ │
│ │ │ │ │ │ │ │ │ │ │
│ └────┼────┘ └────┼────┘ └────┼────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ Profile Profile Profile ◄── Only these │
│ vectors vectors vectors leave sites │
│ │ │ │ │
│ └────────────┼────────────┘ │
│ ▼ │
│ poolLikelihoodProfiles() │
│ │ │
│ ▼ │
│ computeMREstimate() │
│ runSensitivityAnalyses() │
│ generateMRReport() │
└──────────────────────────────────────────────────────────┘
Installation
# Install from GitHub
remotes::install_github("OHDSI/Medusa")
# Or install with all dependencies
remotes::install_github("OHDSI/Medusa", dependencies = TRUE)Quick Start
library(Medusa)
# 1. Assemble instruments from OpenGWAS (coordinator)
instruments <- getMRInstruments(
exposureTraitId = "ieu-a-1119", # IL-6 receptor
pThreshold = 5e-8,
r2Threshold = 0.001
)
# 2. At each site: build cohort and fit outcome model
cohort <- buildMRCohort(
connectionDetails = connectionDetails,
cdmDatabaseSchema = "cdm",
cohortDatabaseSchema = "results",
cohortTable = "cohort",
outcomeCohortId = 1234,
instrumentTable = instruments,
genomicDatabaseSchema = "genomics" # Schema with VARIANT_OCCURRENCE
)
profile <- fitOutcomeModel(
cohortData = cohort,
instrumentTable = instruments,
siteId = "my_site"
)
# 3. At coordinator: pool profiles and estimate
combined <- poolLikelihoodProfiles(list(siteA = profileA, siteB = profileB))
estimate <- computeMREstimate(combined, instruments)
# 4. Generate report
generateMRReport(
mrEstimate = estimate,
combinedProfile = combined,
exposureLabel = "IL-6 signaling",
outcomeLabel = "Colorectal cancer"
)Main Functions
| Function | Module | Description |
|---|---|---|
getMRInstruments() |
Instrument Assembly | Query OpenGWAS for instruments, LD clump |
createInstrumentTable() |
Instrument Assembly | Build instruments from local data |
buildMRCohort() |
Cohort Extraction | Extract cohort + genotypes from OMOP CDM |
buildMRCovariates() |
Covariate Assembly | Assemble covariates via FeatureExtraction |
runInstrumentDiagnostics() |
Diagnostics | F-stats, PheWAS, allele-frequency and missingness checks |
fitOutcomeModel() |
Outcome Model | Fit outcome model, evaluate profile likelihood |
poolLikelihoodProfiles() |
Pooling | Sum log-likelihood profiles across sites |
computeMREstimate() |
Estimation | Wald ratio with delta method SE |
runSensitivityAnalyses() |
Sensitivity | IVW, MR-Egger, weighted median, etc. |
generateMRReport() |
Reporting | Self-contained HTML report |
Vignettes
- Getting Started — Installation, concepts, quick example with simulated data
- IL-6 and Colorectal Cancer — Complete scientific walkthrough
- Federated Analysis Guide — Network coordinator instructions
Requirements
- R >= 4.1.0
- OHDSI packages: DatabaseConnector, SqlRender, Cyclops, FeatureExtraction
- OMOP CDM database with OMOP Genomic CDM (VARIANT_OCCURRENCE table)
- For instrument retrieval: internet access to OpenGWAS API
References
- Davey Smith & Hemani (2014). Mendelian randomization: genetic anchors for causal inference. Human Molecular Genetics.
- Bowden et al. (2015). Mendelian randomization with invalid instruments: MR-Egger. IJE.
- Bowden et al. (2016). Weighted median estimator. Genetic Epidemiology.
- Suchard et al. (2013). Cyclops: massive parallelization of serial inference. ACM TOMACS.
