
Instrument assembly from OpenGWAS
getMRInstruments.RdQueries the IEU OpenGWAS database via the ieugwasr package to retrieve
GWAS summary statistics for a specified exposure trait, applies LD clumping
to obtain independent instruments, computes approximate F-statistics, flags
strand-ambiguous SNPs, and returns a clean instrument table ready for
distribution to federated analysis sites.
This function runs at the coordinator node. The returned instrument table is serialized to disk and distributed to all sites unchanged.
Usage
getMRInstruments(
exposureTraitId,
pThreshold = 5e-08,
r2Threshold = 0.001,
kb = 10000,
ancestryPopulation = "EUR",
additionalSnps = NULL
)Arguments
- exposureTraitId
Character string. IEU OpenGWAS trait ID (e.g., "ieu-a-1119" for IL-6 receptor levels).
- pThreshold
Numeric. Genome-wide significance p-value threshold for selecting SNPs. Default is 5e-8.
- r2Threshold
Numeric. LD clumping r-squared threshold. SNP pairs with r-squared above this value will be pruned, keeping the more significant SNP. Default is 0.001.
- kb
Numeric. LD clumping window in kilobases. Default is 10000.
- ancestryPopulation
Character. Reference panel ancestry population for LD clumping. One of "EUR", "EAS", "AFR", "SAS", "AMR". Default is "EUR".
- additionalSnps
Optional character vector of SNP rsIDs to force-include in the instrument set (added after clumping, not subject to LD pruning). Default is NULL.
Value
A data frame with one row per independent instrument SNP and columns:
- snp_id
rsID of the SNP (e.g., "rs2228145").
- effect_allele
The allele associated with increased exposure level.
- other_allele
The non-effect allele.
- beta_ZX
Effect size of the SNP on the exposure (log scale).
- se_ZX
Standard error of beta_ZX.
- pval_ZX
P-value for the SNP-exposure association.
- eaf
Effect allele frequency in the GWAS reference population.
- gene_region
Nearest gene or genomic region annotation.
- fStatistic
Approximate F-statistic: (beta_ZX / se_ZX)^2.
- strandAmbiguous
Logical. TRUE if the SNP is strand-ambiguous (A/T or G/C allele pair).
The data frame also carries the following attributes:
- retrievalTimestamp
POSIXct timestamp of when instruments were retrieved.
- exposureTraitId
The trait ID used for retrieval.
- parameters
List of all parameter values used.
Details
Retrieve and Clump Genetic Instruments for Mendelian Randomization
The function queries the IEU OpenGWAS API for all SNP associations with the
specified trait below pThreshold, then applies LD clumping using the
specified reference panel to retain only independent instruments. The
approximate F-statistic is computed as (beta / SE)^2 for each SNP. SNPs
with F < 10 are flagged as potentially weak instruments via a warning message
but are not removed automatically.
If the OpenGWAS API is unavailable, the function throws an informative error suggesting the user provide a cached instrument table instead.
References
Hemani, G., et al. (2018). The MR-Base platform supports systematic causal inference across the human phenome. eLife, 7, e34408.
Examples
# Using simulated data (no API call)
instruments <- simulateInstrumentTable(nSnps = 10)
head(instruments)
#> snp_id effect_allele other_allele beta_ZX se_ZX pval_ZX
#> 1 rs3781070 T G -0.4659224 0.07424188 3.479855e-10
#> 2 rs2440057 A G 0.4748302 0.02832261 4.395942e-63
#> 3 rs1521507 T G -0.2144558 0.07933350 6.867154e-03
#> 4 rs5432853 G T 0.4321791 0.07680009 1.830262e-08
#> 5 rs2154668 T C -0.3566982 0.02494625 2.230363e-46
#> 6 rs6986500 A G -0.3076384 0.05085271 1.452087e-09
#> eaf gene_region
#> 1 0.6580465 GENE1
#> 2 0.9345355 GENE2
#> 3 0.7335898 GENE3
#> 4 0.5598396 GENE4
#> 5 0.8147207 GENE5
#> 6 0.2205265 GENE6
if (FALSE) { # \dontrun{
# Real API call (requires internet)
instruments <- getMRInstruments(
exposureTraitId = "ieu-a-1119",
pThreshold = 5e-8,
r2Threshold = 0.001,
kb = 10000,
ancestryPopulation = "EUR"
)
} # }