
Instrument diagnostics for MR analysis
runInstrumentDiagnostics.RdPerforms comprehensive instrument validation using OMOP covariate data. This is a key differentiating feature of the Medusa package. Diagnostics include first-stage F-statistics, instrument PheWAS across OMOP covariate domains, negative control outcome tests, allele frequency comparisons, and genotype missingness summaries.
The results feed directly into the HTML report generated by
generateMRReport.
Usage
runInstrumentDiagnostics(
cohortData,
covariateData,
instrumentTable,
exposureProxyConceptIds = NULL,
negativeControlOutcomeIds = NULL,
pValueThreshold = NULL
)Arguments
- cohortData
Data frame. Output of
buildMRCohort.- covariateData
Either the list output of
buildMRCovariatesor a plain data frame with person-level covariates.- instrumentTable
Data frame. Output of
getMRInstruments.- exposureProxyConceptIds
Optional integer vector of OMOP concept IDs for an exposure measurement proxy (e.g., IL-6 lab measurement). Used to compute first-stage F-statistic from actual data rather than GWAS approximation.
- negativeControlOutcomeIds
Optional integer vector of cohort IDs for negative control outcomes to analyze from the
nc_outcome_<id>columns present incohortData. If NULL, all such columns are used.- pValueThreshold
Numeric. P-value threshold for PheWAS significance. Default is Bonferroni-corrected (0.05 / number of covariates tested).
Value
A named list with class "medusaDiagnostics" containing:
- fStatistics
Data frame with snp_id, fStatistic, and weakFlag.
- phewasResults
Data frame with snp_id, covariate_id, covariate_name, beta, se, pval, domain_id, significant.
- negativeControlResults
NULL when negative controls were not requested and no
nc_outcome_<id>columns are present. Otherwise a data frame with one row per analyzed negative control outcome and columnsoutcome_id,beta_ZY,se_ZY,beta_MR,se_MR, andpval. Returns an empty data frame when requested negative control columns are unavailable incohortData.- afComparison
Data frame with snp_id, eaf_gwas, eaf_cohort, eaf_diff, discrepancyFlag.
- missingnessReport
Data frame with snp_id, n_total, n_missing, pct_missing, highMissingFlag.
- diagnosticFlags
Named logical vector summarizing which checks raised concerns.
Details
Run Instrument Validation Diagnostics
F-statistic: If exposureProxyConceptIds are provided, the
first-stage F-statistic is computed by regressing the proxy measurement
on each SNP. Otherwise, the GWAS approximation (beta_ZX / se_ZX)^2 is used.
F < 10 indicates a potentially weak instrument.
PheWAS: Each SNP is regressed against every covariate in the covariate matrix using logistic (binary) or linear (continuous) regression. Bonferroni correction is applied. Significant associations may indicate pleiotropy.
Allele frequency check: Compares the effect allele frequency in the cohort to the GWAS reference. A discrepancy > 0.1 may indicate strand flip or population mismatch.
Negative controls: When nc_outcome_<id> columns are present,
Medusa runs runNegativeControlAnalysis on the requested subset
of outcomes. The negativeControlFailure diagnostic flag reflects the
aggregate systematic-bias signal from that analysis rather than whether any
single negative control reaches nominal p < 0.05.
References
Bowden, J., et al. (2018). Improving the accuracy of two-sample summary-data Mendelian randomization: moving beyond the NOME assumption. International Journal of Epidemiology, 48(3), 728-742.
Examples
# Using simulated data
simData <- simulateMRData(n = 1000, nSnps = 5)
covData <- simulateCovariateData(n = 1000, nCovariates = 20)