CohortSurvival

CRANstatuscodecov.io R-CMD-check Lifecycle:Experimental

CohortSurvival contains functions for extracting and summarising survival data using the OMOP common data model.

Installation

You can install the development version of CohortSurvival like so:

install.packages("remotes")
remotes::install_github("darwin-eu/CohortSurvival")

Example usage

Create a reference to data in the OMOP CDM format

The CohortSurvival package is designed to work with data in the OMOP CDM format, so our first step is to create a reference to the data using the CDMConnector package.

library(CDMConnector)
library(CohortSurvival)
library(dplyr)
library(ggplot2)

Creating a connection to a Postgres database would for example look like:

con <- DBI::dbConnect(RPostgres::Postgres(),
  dbname = Sys.getenv("CDM5_POSTGRESQL_DBNAME"),
  host = Sys.getenv("CDM5_POSTGRESQL_HOST"),
  user = Sys.getenv("CDM5_POSTGRESQL_USER"),
  password = Sys.getenv("CDM5_POSTGRESQL_PASSWORD")
)

cdm <- CDMConnector::cdm_from_con(con,
  cdm_schema = Sys.getenv("CDM5_POSTGRESQL_CDM_SCHEMA"),
  write_schema = Sys.getenv("CDM5_POSTGRESQL_RESULT_SCHEMA")
)

Example: MGUS

For this example we´ll use a cdm reference containing the MGUS2 dataset from the survival package (which we transformed into a set of OMOP CDM style cohort tables). The mgus2 dataset contains survival data of 1341 sequential patients with monoclonal gammopathy of undetermined significance (MGUS). For more information see ´?survival::mgus2´

cdm <- CohortSurvival::mockMGUS2cdm()

In this example cdm reference we have three cohort tables of interest: 1) MGUS diagnosis cohort

cdm$mgus_diagnosis %>%
  glimpse()
#> Rows: ??
#> Columns: 10
#> Database: DuckDB v0.9.2 [eburn@Windows 10 x64:R 4.2.1/:memory:]
#> $ cohort_definition_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ subject_id           <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15…
#> $ cohort_start_date    <date> 1981-01-01, 1968-01-01, 1980-01-01, 1977-01-01, …
#> $ cohort_end_date      <date> 1981-01-01, 1968-01-01, 1980-01-01, 1977-01-01, …
#> $ age                  <dbl> 88, 78, 94, 68, 90, 90, 89, 87, 86, 79, 86, 89, 8…
#> $ sex                  <fct> F, F, M, M, F, M, F, F, F, F, M, F, M, F, M, F, F…
#> $ hgb                  <dbl> 13.1, 11.5, 10.5, 15.2, 10.7, 12.9, 10.5, 12.3, 1…
#> $ creat                <dbl> 1.30, 1.20, 1.50, 1.20, 0.80, 1.00, 0.90, 1.20, 0…
#> $ mspike               <dbl> 0.5, 2.0, 2.6, 1.2, 1.0, 0.5, 1.3, 1.6, 2.4, 2.3,…
#> $ age_group            <chr> ">=70", ">=70", ">=70", "<70", ">=70", ">=70", ">…
  1. MGUS progression cohort
cdm$progression %>%
  glimpse()
#> Rows: ??
#> Columns: 4
#> Database: DuckDB v0.9.2 [eburn@Windows 10 x64:R 4.2.1/:memory:]
#> $ cohort_definition_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ subject_id           <dbl> 56, 81, 83, 111, 124, 127, 147, 163, 165, 167, 18…
#> $ cohort_start_date    <date> 1978-01-30, 1985-01-15, 1974-08-17, 1993-01-14, …
#> $ cohort_end_date      <date> 1978-01-30, 1985-01-15, 1974-08-17, 1993-01-14, …
  1. Death cohort
cdm$death_cohort %>%
  glimpse()
#> Rows: ??
#> Columns: 4
#> Database: DuckDB v0.9.2 [eburn@Windows 10 x64:R 4.2.1/:memory:]
#> $ cohort_definition_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ subject_id           <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 1…
#> $ cohort_start_date    <date> 1981-01-31, 1968-01-26, 1980-02-16, 1977-04-03, …
#> $ cohort_end_date      <date> 1981-01-31, 1968-01-26, 1980-02-16, 1977-04-03, …

MGUS diagnosis to death

We can get survival estimates for death following diagnosis like so:

MGUS_death <- estimateSingleEventSurvival(cdm,
  targetCohortTable = "mgus_diagnosis",
  targetCohortId = 1,
  outcomeCohortTable = "death_cohort",
  outcomeCohortId = 1
)

plotSurvival(MGUS_death)

Stratified results

MGUS_death <- estimateSingleEventSurvival(cdm,
  targetCohortTable = "mgus_diagnosis",
  targetCohortId = 1,
  outcomeCohortTable = "death_cohort",
  outcomeCohortId = 1, 
  strata = list(c("age_group"),
                c("sex"),
                c("age_group", "sex"))
)

plotSurvival(MGUS_death, 
             colour = "strata_level", 
             facet= "strata_name")

Summary statistics on survival

We can summarise our results in a table.

tableSurvival(MGUS_death) 
#> # A tibble: 9 × 10
#>   cdm_name cohort         variable_level analysis_type outcome   age_group sex  
#>   <chr>    <chr>          <chr>          <chr>         <chr>     <chr>     <chr>
#> 1 mock     mgus_diagnosis death_cohort   single_event  death_co… overall   over…
#> 2 mock     mgus_diagnosis death_cohort   single_event  death_co… <70       over…
#> 3 mock     mgus_diagnosis death_cohort   single_event  death_co… >=70      over…
#> 4 mock     mgus_diagnosis death_cohort   single_event  death_co… overall   F    
#> 5 mock     mgus_diagnosis death_cohort   single_event  death_co… overall   M    
#> 6 mock     mgus_diagnosis death_cohort   single_event  death_co… <70       F    
#> 7 mock     mgus_diagnosis death_cohort   single_event  death_co… <70       M    
#> 8 mock     mgus_diagnosis death_cohort   single_event  death_co… >=70      F    
#> 9 mock     mgus_diagnosis death_cohort   single_event  death_co… >=70      M    
#> # ℹ 3 more variables: number_records <dbl>, events <dbl>,
#> #   `Median survival (95% CI)` <chr>

Estimating survival with competing risk

The package also allows to estimate survival of both an outcome and competing risk outcome. We can then stratify, see information on events, summarise the estimates and check the contributing participants in the same way we did for the single event survival analysis.

MGUS_death_prog <- estimateCompetingRiskSurvival(cdm,
  targetCohortTable = "mgus_diagnosis",
  outcomeCohortTable = "progression",
  competingOutcomeCohortTable = "death_cohort"
)

plotSurvival(MGUS_death_prog, cumulativeFailure = TRUE,
             colour = "variable_level")

Estimating survival with competing risk and strata

Similarly, we can ask for a competing risk survival and stratification of the results by variables added previously to the cohort given. The in-built function allows us to plot the results of the strata levels by discarding the ones for the overall cohort.

MGUS_death_prog <-  estimateCompetingRiskSurvival(cdm,
  targetCohortTable = "mgus_diagnosis",
  outcomeCohortTable = "progression",
  competingOutcomeCohortTable = "death_cohort",
  strata = list(c("sex"))
)

plotSurvival(MGUS_death_prog  %>%
             dplyr::filter(strata_name != "Overall"), 
             cumulativeFailure = TRUE,
             facet = "strata_level",
             colour = "variable_level")

Disconnect from the cdm database connection

cdm_disconnect(cdm)