
Outcome model with exact grid likelihood evaluation
fitOutcomeModel.RdFits a logistic regression of the binary outcome on SNP genotype(s) plus covariates, then evaluates the profile log-likelihood across a pre-specified grid of beta_ZY values. This is the core methodological function for federated MR: each site runs this locally and shares only the resulting log-likelihood profile vector.
Two analysis modes are supported: alleleScore fits a single model
using a weighted allele score as the genetic exposure variable, while
perSNP fits separate models for each SNP (needed for multi-SNP
sensitivity analyses).
Usage
fitOutcomeModel(
cohortData,
covariateData = NULL,
instrumentTable,
betaGrid = seq(-3, 3, by = 0.01),
regularizationVariance = 0.1,
instrumentRegularization = FALSE,
outcomeType = "binary",
analysisType = "alleleScore",
siteId = "site_1",
modelBackend = "glm"
)Arguments
- cohortData
Data frame. Output of
buildMRCohort.- covariateData
Covariate data. Output of
buildMRCovariatesor a data frame with person_id and covariate columns.- instrumentTable
Data frame. Output of
getMRInstruments.- betaGrid
Numeric vector. Grid of beta_ZY values at which to evaluate the profile log-likelihood. Default is
seq(-3, 3, by = 0.01)(601 grid points).- regularizationVariance
Numeric. When
modelBackend = "cyclops", this is the variance of the normal prior applied to nuisance coefficients. Smaller values imply stronger shrinkage; useInffor an unpenalized Cyclops fit. Ignored by the"glm"backend. Default is 0.1.- instrumentRegularization
Logical. When
modelBackend = "cyclops", whether the exposure coefficient is included in the Cyclops prior. Default is FALSE so the profiled exposure coefficient remains unpenalized. Ignored by the"glm"backend.- outcomeType
Character. Type of outcome: "binary" for logistic regression. Default is "binary".
- analysisType
Character. "alleleScore" for single weighted score model or "perSNP" for separate per-SNP models. Default is "alleleScore".
- siteId
Character. Identifier for this site. Included in the returned profile object. Default is "site_1".
- modelBackend
Character. Outcome-model fitting backend:
"glm"uses base R logistic regression, while"cyclops"uses Cyclops for scalable logistic regression with optional Gaussian shrinkage on nuisance covariates. Default is"glm".
Value
A list with class "medusaSiteProfile" containing:
- siteId
Character site identifier.
- betaGrid
Numeric vector of grid points (same as input).
- logLikProfile
Numeric vector of profile log-likelihood values, same length as betaGrid. In
perSNPmode this remains the valid allele-score profile used for pooling.- nCases
Number of outcome cases.
- nControls
Number of controls.
- snpIds
Character vector of SNP IDs used.
- diagnosticFlags
List of diagnostic flags from the model fit.
- betaHat
Point estimate of beta_ZY (MLE from profile).
- seHat
Approximate standard error from profile curvature.
- perSnpEstimates
When
analysisType = "perSNP", a data frame of per-SNP beta_ZY / se_ZY estimates for summarized-data sensitivity analyses.
This object contains no individual-level data and is safe to share.
Details
Fit Outcome Model and Evaluate Profile Log-Likelihood
The profile log-likelihood evaluation works as follows:
Fit the unconstrained model to obtain the MLE and its Wald SE.
At each grid point, fix beta_ZY to that value and re-fit the nuisance parameters using the fixed term as an offset.
Record the maximized constrained log-likelihood at each grid point. This is the exact profile likelihood for the coefficient in a logistic generalized linear model.
When modelBackend = "cyclops" and regularizationVariance is
finite, the nuisance parameters are estimated under a Gaussian prior at both
the unconstrained optimum and every grid point. In that configuration the
returned objective is a penalized profile, not an unpenalized MLE profile.
When using the allele score, weights are \(w_j = \gamma_j / \mathrm{SE}(\gamma_j)^2\), normalized by \(\sum_j |w_j|\). The same weights are reused downstream so that the MR denominator matches the exact score fitted at each site.
Missing SNP dosages in the allele-score model are imputed to the expected dosage \(2 \times \mathrm{EAF}_j\) from the instrument table, rather than being treated as homozygous reference. This avoids a systematic downward bias in the score when genotype missingness is present.
References
Suchard, M. A., et al. (2013). Massive parallelization of serial inference algorithms for a complex generalized linear model. ACM Transactions on Modeling and Computer Simulation, 23(1), 1-17.
Examples
simData <- simulateMRData(n = 2000, nSnps = 5, trueEffect = 0.3)
profile <- fitOutcomeModel(
cohortData = simData$data,
covariateData = NULL,
instrumentTable = simData$instrumentTable,
betaGrid = seq(-2, 2, by = 0.05)
)
#> Fitting outcome model at site 'site_1' (1171 cases, 829 controls)...
#> Site 'site_1': beta_ZY_hat = 0.3273 (SE = 0.1919).
plot(profile$betaGrid, profile$logLikProfile, type = "l",
xlab = "beta_ZY", ylab = "Profile log-likelihood")