Instrument diagnostics for MR analysis

Performs comprehensive instrument validation using OMOP covariate data. This is a key differentiating feature of the Medusa package. Diagnostics include first-stage F-statistics, instrument PheWAS across OMOP covariate domains, negative control outcome tests, allele frequency comparisons, and genotype missingness summaries.

The results feed directly into the HTML report generated by generateMRReport.

Usage

runInstrumentDiagnostics(
  cohortData,
  covariateData,
  instrumentTable,
  exposureProxyConceptIds = NULL,
  negativeControlOutcomeIds = NULL,
  pValueThreshold = NULL
)

Arguments

cohortData: Data frame. Output of buildMRCohort.
covariateData: Either the list output of buildMRCovariates or a plain data frame with person-level covariates.
instrumentTable: Data frame. Output of getMRInstruments.
exposureProxyConceptIds: Optional integer vector of OMOP concept IDs for an exposure measurement proxy (e.g., IL-6 lab measurement). Used to compute first-stage F-statistic from actual data rather than GWAS approximation.
negativeControlOutcomeIds: Optional integer vector of cohort IDs for negative control outcomes to analyze from the nc_outcome_<id> columns present in cohortData. If NULL, all such columns are used.
pValueThreshold: Numeric. P-value threshold for PheWAS significance. Default is Bonferroni-corrected (0.05 / number of covariates tested).

Value

A named list with class "medusaDiagnostics" containing:

fStatistics: Data frame with snp_id, fStatistic, and weakFlag.
phewasResults: Data frame with snp_id, covariate_id, covariate_name, beta, se, pval, domain_id, significant.
negativeControlResults: NULL when negative controls were not requested and no nc_outcome_<id> columns are present. Otherwise a data frame with one row per analyzed negative control outcome and columns outcome_id, beta_ZY, se_ZY, beta_MR, se_MR, and pval. Returns an empty data frame when requested negative control columns are unavailable in cohortData.
afComparison: Data frame with snp_id, eaf_gwas, eaf_cohort, eaf_diff, discrepancyFlag.
missingnessReport: Data frame with snp_id, n_total, n_missing, pct_missing, highMissingFlag.
diagnosticFlags: Named logical vector summarizing which checks raised concerns.

Details

Run Instrument Validation Diagnostics

F-statistic: If exposureProxyConceptIds are provided, the first-stage F-statistic is computed by regressing the proxy measurement on each SNP. Otherwise, the GWAS approximation (beta_ZX / se_ZX)^2 is used. F < 10 indicates a potentially weak instrument.

PheWAS: Each SNP is regressed against every covariate in the covariate matrix using logistic (binary) or linear (continuous) regression. Bonferroni correction is applied. Significant associations may indicate pleiotropy.

Allele frequency check: Compares the effect allele frequency in the cohort to the GWAS reference. A discrepancy > 0.1 may indicate strand flip or population mismatch.

Negative controls: When nc_outcome_<id> columns are present, Medusa runs runNegativeControlAnalysis on the requested subset of outcomes. The negativeControlFailure diagnostic flag reflects the aggregate systematic-bias signal from that analysis rather than whether any single negative control reaches nominal p < 0.05.

References

Bowden, J., et al. (2018). Improving the accuracy of two-sample summary-data Mendelian randomization: moving beyond the NOME assumption. International Journal of Epidemiology, 48(3), 728-742.

Examples

# Using simulated data
simData <- simulateMRData(n = 1000, nSnps = 5)
covData <- simulateCovariateData(n = 1000, nCovariates = 20)