Skip to contents

Runs locally at each OMOP CDM site. Executes parameterized SQL to extract outcome cohort data and genotype data, performs allele harmonization to ensure genotype coding matches the instrument table, and returns a local R data frame suitable for downstream analysis. The returned data never leaves the site.

Genotype data is extracted from the VARIANT_OCCURRENCE table defined by the OMOP CDM Genomic Extension. The minimal required columns from that table are:

  • person_id — links variants to persons

  • rs_id — dbSNP rs identifier for the variant

  • genotype — genotype call (VCF-style "0/0", "0/1", "1/1" or plain integer "0", "1", "2")

Additionally, reference_allele and alternate_allele are used for allele harmonization when available.

The function also queries: PERSON (age, sex), CONDITION_OCCURRENCE (outcome), and OBSERVATION_PERIOD (eligibility). All SQL is rendered and translated via SqlRender for cross-dialect compatibility.

Usage

buildMRCohort(
  connectionDetails,
  cdmDatabaseSchema,
  cohortDatabaseSchema,
  cohortTable,
  outcomeCohortId,
  instrumentTable,
  genomicDatabaseSchema = cdmDatabaseSchema,
  indexDateOffset = 0,
  washoutPeriod = 365,
  excludePriorOutcome = TRUE,
  negativeControlCohortIds = NULL
)

Arguments

connectionDetails

A DatabaseConnector::connectionDetails object specifying the database connection.

cdmDatabaseSchema

Character. Schema containing the OMOP CDM tables.

cohortDatabaseSchema

Character. Schema containing the cohort table.

cohortTable

Character. Name of the cohort table.

outcomeCohortId

Integer. Cohort definition ID for the outcome of interest (e.g., incident colorectal cancer).

instrumentTable

Data frame. Output of getMRInstruments or createInstrumentTable containing the instrument SNPs.

genomicDatabaseSchema

Character. Schema containing the VARIANT_OCCURRENCE table from the OMOP Genomic Extension. Defaults to cdmDatabaseSchema.

indexDateOffset

Integer. Days offset from cohort start date for defining the index date. Default is 0.

washoutPeriod

Integer. Minimum days of prior observation required before index date. Default is 365.

excludePriorOutcome

Logical. If TRUE, persons with the outcome before their index date are excluded. Default is TRUE.

negativeControlCohortIds

Optional integer vector of cohort definition IDs for negative control outcomes. When provided, the function extracts negative control outcome flags and adds nc_outcome_<id> columns (binary 0/1) to the returned data frame. These flags are evaluated from each person's index date onward so they align with the outcome risk window used in the main analysis. These columns are used by runNegativeControlAnalysis for empirical calibration.

Value

A data frame with one row per person and columns:

person_id

Integer person identifier.

outcome

Integer 0/1 outcome status.

age_at_index

Age in years at index date.

gender_concept_id

OMOP concept ID for gender.

index_date

Date of cohort entry.

snp_<sanitized rsID>

Integer genotype values (0, 1, or 2) for each instrument SNP, coded as count of the effect allele after harmonization. Missing genotypes are NA, not 0.

Details

Build Study Cohort for Mendelian Randomization at a Single Site

Genotype coding: genotypes are coded as 0, 1, 2 representing the count of effect alleles. The function performs allele harmonization by comparing the effect allele in the instrument table to the allele coding in the genotype data (alternate allele from VARIANT_OCCURRENCE). If alleles are swapped, genotypes are flipped (2 - genotype) and instrument beta_ZX is negated.

References

OHDSI Genomic CDM: https://github.com/OHDSI/Genomic-CDM

Hripcsak, G., et al. (2015). Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers. Studies in Health Technology and Informatics, 216, 574-578.

Examples

if (FALSE) { # \dontrun{
connectionDetails <- DatabaseConnector::createConnectionDetails(
  dbms = "postgresql",
  server = "localhost/ohdsi",
  user = "user",
  password = "password"
)
instruments <- getMRInstruments("ieu-a-1119")
cohort <- buildMRCohort(
  connectionDetails = connectionDetails,
  cdmDatabaseSchema = "cdm",
  cohortDatabaseSchema = "results",
  cohortTable = "cohort",
  outcomeCohortId = 1234,
  instrumentTable = instruments,
  genomicDatabaseSchema = "genomics"
)
} # }