`twoxtwo`

provides a collection of utilities for data analysis with two-by-two contingency tables. The functions in the package allow users to conveniently aggregate and summarize observation-level data as counts.

The two-by-two table is used in epidemiology to summarize count data by combinations of binary *exposure* and *outcome* variables as follows:

OUTCOME + | OUTCOME - | |
---|---|---|

EXPOSURE + |
A | B |

EXPOSURE - |
C | D |

The notation in the table above corresponds to:

- A = Exposed (Exposure “+”) and health indicator present (Outcome “+”)
- B = Exposed (Exposure “+”) and health indicator absent (Outcome “-”)
- C = Unexposed (Exposure “-”) and health indicator present (Outcome “+”)
- D = Unexposed (Exposure “-”) and health indicator absent (Outcome “-”)

The package allows for construction of two-by-two tables, as well as direct calculation of measures of effect and hypothesis testing to assess the relationship between the epidemiological *exposure* and *outcome* variables.

`twoxtwo`

The usage demonstration below requires that the `twoxtwo`

and `dplyr`

packages are loaded:

```
library(twoxtwo)
library(dplyr)
```

The data set used to illustrate the `twoxtwo`

functions will be observation-level data describing smoking status (*exposure*) and high blood pressure (*outcome*). For this example, there will be 100 smokers and 200 non-smokers. Of the smokers, 40 will have high blood pressure. 50 of the non-smokers will have high blood pressure:

```
<-
sh tibble(
smoke = c(rep(TRUE, 100), rep(FALSE,200)),
hbp = c(rep(1,40),rep(0,60),rep(1,50),rep(0,150))
)
```

```
sh# # A tibble: 300 x 2
# smoke hbp
# <lgl> <dbl>
# 1 TRUE 1
# 2 TRUE 1
# 3 TRUE 1
# 4 TRUE 1
# 5 TRUE 1
# 6 TRUE 1
# 7 TRUE 1
# 8 TRUE 1
# 9 TRUE 1
# 10 TRUE 1
# # … with 290 more rows
```

The `twoxtwo()`

constructor function will aggregate the observations to counts by exposure and outcome:

```
<-
sh_2x2 %>%
sh twoxtwo(., exposure = smoke, outcome = hbp)
```

The `twoxtwo`

object is an `S3`

class. When printed to the console it will display the counts in the contingency table:

```
sh_2x2# | | |OUTCOME |OUTCOME |
# |:--------|:-----------|:-------|:-------|
# | | |hbp=1 |hbp=0 |
# |EXPOSURE |smoke=TRUE |40 |60 |
# |EXPOSURE |smoke=FALSE |50 |150 |
```

The object is a list with multiple elements, each of which can be extracted by name if needed.

For example, to view the aggregated counts as a `tibble`

:

```
$tbl
sh_2x2# # A tibble: 2 x 4
# hbp_1 hbp_0 exposure outcome
# <dbl> <dbl> <chr> <chr>
# 1 40 60 smoke::TRUE/FALSE hbp::1/0
# 2 50 150 smoke::TRUE/FALSE hbp::1/0
```

To view counts of each cell per the two-by-two notation:

```
$cells
sh_2x2# $A
# [1] 40
#
# $B
# [1] 60
#
# $C
# [1] 50
#
# $D
# [1] 150
```

To view the exposure variable and its levels:

```
$exposure
sh_2x2# $variable
# [1] "smoke"
#
# $levels
# [1] "TRUE" "FALSE"
```

To view the outcome variable and its levels:

```
$outcome
sh_2x2# $variable
# [1] "hbp"
#
# $levels
# [1] "1" "0"
```

To view the number of observations missing either exposure or outcome:

```
$n_missing
sh_2x2# [1] 0
```

And to view the original data (stored in the `twoxtwo`

object by default^{1}):

```
$data
sh_2x2# # A tibble: 300 x 2
# smoke hbp
# <lgl> <dbl>
# 1 TRUE 1
# 2 TRUE 1
# 3 TRUE 1
# 4 TRUE 1
# 5 TRUE 1
# 6 TRUE 1
# 7 TRUE 1
# 8 TRUE 1
# 9 TRUE 1
# 10 TRUE 1
# # … with 290 more rows
```

The `S3`

class has a summary method, which summarizes the count data and computes measures of effect (odds ratio, risk ratio, and risk difference). When the summary is printed it displays the count data, information about the `twoxtwo`

object (missing data and exposure/outcome), as well as effect measures:

```
%>%
sh_2x2 summary(.)
#
# | | |OUTCOME |OUTCOME |
# |:--------|:-----------|:-------|:-------|
# | | |hbp=1 |hbp=0 |
# |EXPOSURE |smoke=TRUE |40 |60 |
# |EXPOSURE |smoke=FALSE |50 |150 |
#
#
# Outcome: hbp
# Outcome + : 1
# Outcome - : 0
#
# Exposure: smoke
# Exposure + : TRUE
# Exposure - : FALSE
#
# Number of missing observations: 0
#
# Odds Ratio: 2 (1.198,3.338)
# Risk Ratio: 1.6 (1.139,2.247)
# Risk Difference: 0.15 (0.037,0.263)
```

When the summary is assigned to an object, it stores a named list with the effect measures:

```
<-
sh_2x2_sum %>%
sh_2x2 summary(.)
#
# | | |OUTCOME |OUTCOME |
# |:--------|:-----------|:-------|:-------|
# | | |hbp=1 |hbp=0 |
# |EXPOSURE |smoke=TRUE |40 |60 |
# |EXPOSURE |smoke=FALSE |50 |150 |
#
#
# Outcome: hbp
# Outcome + : 1
# Outcome - : 0
#
# Exposure: smoke
# Exposure + : TRUE
# Exposure - : FALSE
#
# Number of missing observations: 0
#
# Odds Ratio: 2 (1.198,3.338)
# Risk Ratio: 1.6 (1.139,2.247)
# Risk Difference: 0.15 (0.037,0.263)
```

```
sh_2x2_sum# $odds_ratio
# # A tibble: 1 x 6
# measure estimate ci_lower ci_upper exposure outcome
# <chr> <dbl> <dbl> <dbl> <chr> <chr>
# 1 Odds Ratio 2 1.20 3.34 smoke::TRUE/FALSE hbp::1/0
#
# $risk_ratio
# # A tibble: 1 x 6
# measure estimate ci_lower ci_upper exposure outcome
# <chr> <dbl> <dbl> <dbl> <chr> <chr>
# 1 Risk Ratio 1.6 1.14 2.25 smoke::TRUE/FALSE hbp::1/0
#
# $risk_difference
# # A tibble: 1 x 6
# measure estimate ci_lower ci_upper exposure outcome
# <chr> <dbl> <dbl> <dbl> <chr> <chr>
# 1 Risk Difference 0.15 0.0368 0.263 smoke::TRUE/FALSE hbp::1/0
```

```
do.call("rbind", sh_2x2_sum)
# # A tibble: 3 x 6
# measure estimate ci_lower ci_upper exposure outcome
# * <chr> <dbl> <dbl> <dbl> <chr> <chr>
# 1 Odds Ratio 2 1.20 3.34 smoke::TRUE/FALSE hbp::1/0
# 2 Risk Ratio 1.6 1.14 2.25 smoke::TRUE/FALSE hbp::1/0
# 3 Risk Difference 0.15 0.0368 0.263 smoke::TRUE/FALSE hbp::1/0
```

Note that the measures of effect are only computed in the `twoxtwo()`

summary if the “retain” argument is set to `TRUE`

.

Individual measures of effect (odds ratio, risk ratio, and risk difference) can be calculated directly. Each measure includes the point estimate and confidence interval based on the \(\alpha\) specified and standard error around the estimate. If the user passes a `twoxtwo`

object into a data analysis function, the exposure and outcome will be inherited:

```
%>%
sh_2x2 odds_ratio()
# # A tibble: 1 x 6
# measure estimate ci_lower ci_upper exposure outcome
# <chr> <dbl> <dbl> <dbl> <chr> <chr>
# 1 Odds Ratio 2 1.20 3.34 smoke::TRUE/FALSE hbp::1/0
```

```
%>%
sh_2x2 risk_ratio()
# # A tibble: 1 x 6
# measure estimate ci_lower ci_upper exposure outcome
# <chr> <dbl> <dbl> <dbl> <chr> <chr>
# 1 Risk Ratio 1.6 1.14 2.25 smoke::TRUE/FALSE hbp::1/0
```

```
%>%
sh_2x2 risk_diff()
# # A tibble: 1 x 6
# measure estimate ci_lower ci_upper exposure outcome
# <chr> <dbl> <dbl> <dbl> <chr> <chr>
# 1 Risk Difference 0.15 0.0368 0.263 smoke::TRUE/FALSE hbp::1/0
```

Alternatively, users can directly perform data analysis *without* first creating a `twoxtwo`

object:

```
%>%
sh odds_ratio(., exposure = smoke, outcome = hbp, alpha = 0.05)
# # A tibble: 1 x 6
# measure estimate ci_lower ci_upper exposure outcome
# <chr> <dbl> <dbl> <dbl> <chr> <chr>
# 1 Odds Ratio 2 1.20 3.34 smoke::TRUE/FALSE hbp::1/0
```

```
%>%
sh risk_ratio(., exposure = smoke, outcome = hbp, alpha = 0.05)
# # A tibble: 1 x 6
# measure estimate ci_lower ci_upper exposure outcome
# <chr> <dbl> <dbl> <dbl> <chr> <chr>
# 1 Risk Ratio 1.6 1.14 2.25 smoke::TRUE/FALSE hbp::1/0
```

```
%>%
sh risk_diff(., exposure = smoke, outcome = hbp, alpha = 0.05)
# # A tibble: 1 x 6
# measure estimate ci_lower ci_upper exposure outcome
# <chr> <dbl> <dbl> <dbl> <chr> <chr>
# 1 Risk Difference 0.15 0.0368 0.263 smoke::TRUE/FALSE hbp::1/0
```

As with measures of effect, hypothesis tests (Fisher’s exact test for count data and Pearson’s \(\chi^2\) test) can be run on a `twoxtwo`

:

```
%>%
sh_2x2 fisher()
# # A tibble: 1 x 9
# test estimate ci_lower ci_upper statistic df pvalue exposure outcome
# <chr> <dbl> <dbl> <dbl> <lgl> <lgl> <dbl> <chr> <chr>
# 1 Fisher's … 2.00 1.16 3.44 NA NA 0.0108 smoke::T… hbp::1…
```

```
%>%
sh_2x2 chisq()
# # A tibble: 1 x 9
# test estimate ci_lower ci_upper statistic df pvalue exposure outcome
# <chr> <lgl> <lgl> <lgl> <dbl> <int> <dbl> <chr> <chr>
# 1 Pearson's … NA NA NA 6.45 1 0.0111 smoke::… hbp::1…
```

Or *without* first creating a `twoxtwo`

object:

```
%>%
sh fisher(., exposure = smoke, outcome = hbp)
# # A tibble: 1 x 9
# test estimate ci_lower ci_upper statistic df pvalue exposure outcome
# <chr> <dbl> <dbl> <dbl> <lgl> <lgl> <dbl> <chr> <chr>
# 1 Fisher's … 2.00 1.16 3.44 NA NA 0.0108 smoke::T… hbp::1…
```

```
%>%
sh chisq(., exposure = smoke, outcome = hbp)
# # A tibble: 1 x 9
# test estimate ci_lower ci_upper statistic df pvalue exposure outcome
# <chr> <lgl> <lgl> <lgl> <dbl> <int> <dbl> <chr> <chr>
# 1 Pearson's … NA NA NA 6.45 1 0.0111 smoke::… hbp::1…
```

`twoxtwo`

All processing of exposure and outcome requires that both variables **must** have only two levels. By default, variables are coerced to factors and reversed. The result is that, as in the example presented above, a value of `TRUE`

or `1`

will be oriented as exposure or outcome “+” and a corresponding value of `FALSE`

or `0`

will be oriented as exposure or outcome “-”.

The `twoxtwo()`

constructor function is flexible enough to allow user-specified ordering via a named list passed to the “levels” argument:

```
%>%
sh twoxtwo(.,
exposure = smoke,
outcome = hbp,
levels = list(exposure = c(FALSE,TRUE), outcome = c(1,0)))
# | | |OUTCOME |OUTCOME |
# |:--------|:-----------|:-------|:-------|
# | | |hbp=1 |hbp=0 |
# |EXPOSURE |smoke=FALSE |50 |150 |
# |EXPOSURE |smoke=TRUE |40 |60 |
```

As mentioned above, the `twoxtwo()`

function is abstracted in other analysis functions. Each of these functions inherits all arguments that can be passed to `twoxtwo`

, including the “levels” parameter:

```
%>%
sh odds_ratio(.,
exposure = smoke,
outcome = hbp,
levels = list(exposure = c(FALSE,TRUE), outcome = c(1,0)))
# # A tibble: 1 x 6
# measure estimate ci_lower ci_upper exposure outcome
# <chr> <dbl> <dbl> <dbl> <chr> <chr>
# 1 Odds Ratio 0.5 0.300 0.835 smoke::FALSE/TRUE hbp::1/0
```

Users can override this behavior with

`twoxtwo(..., retain = FALSE)`

↩︎