Notice too that `joyn::merge`

masks `base::merge`

, which is ideal. Also, by loading `joyn`

you have a few tables lazy-loaded in memory, x1, x2, x3, x4, y1, y2, y3, and y4.

```
library(joyn)
library(data.table)
x1[]#> id t x
#> 1: 1 1 11
#> 2: 1 2 12
#> 3: 2 1 13
#> 4: 3 2 14
#> 5: NA NA 15
x2[]#> id t x
#> 1: 1 1 16
#> 2: 4 2 12
#> 3: 2 1 NA
#> 4: 3 2 NA
#> 5: NA NA 15
```

You can see in the examples below that by default `joyn`

displays a small summary table.

```
# Natural join. i.e., using as key all the variables that appear in both tables.
# without display the resulting table.
merge(x1, y1)
#> > removing key variables `id` from yvars
#>
#> -- JOYn Report --
#>
#> report n percent
#> 1: x 2 33.3%
#> 2: x & y 3 50%
#> 3: y 1 16.7%
#> 4: total 6 100%
#> ---------------------------------------------------------- End of JOYn report --
#> id t x y report
#> 1: 1 1 11 11 x & y
#> 2: 1 2 12 11 x & y
#> 3: 2 1 13 15 x & y
#> 4: 3 2 14 NA x
#> 5: 4 NA NA 16 y
#> 6: NA NA 15 NA x
```

You can use the reporting format of the `report`

variables, by setting the argument `reporttype`

to numeric, similar to Stata’s output.

code | report | meaning |
---|---|---|

1 | x | Only available in x table |

2 | y | Only available in y table |

3 | x & y | Matching obs available in both tables |

4 | NA updated | NAs in x updated with actual values in variables with same names in y |

5 | value updated | Actual values and NAs in x updated with actual values in variables with same names in y |

6 | not updated | Actual values and NAs in x are NOT updated with actual values in y |

Since the object returned is a `data.table`

, we display the results in the console by concatenating a pair of squared brackets at the end of the call. That is `merge(x1, y1)[]`

instead of `merge(x1, y1)`

.

```
# Natural join. display the results.
merge(x1, y1)[]
#> > removing key variables `id` from yvars
#>
#> -- JOYn Report --
#>
#> report n percent
#> 1: x 2 33.3%
#> 2: x & y 3 50%
#> 3: y 1 16.7%
#> 4: total 6 100%
#> ---------------------------------------------------------- End of JOYn report --
#> id t x y report
#> 1: 1 1 11 11 x & y
#> 2: 1 2 12 11 x & y
#> 3: 2 1 13 15 x & y
#> 4: 3 2 14 NA x
#> 5: 4 NA NA 16 y
#> 6: NA NA 15 NA x
# Making explicit the key variables (strongly suggested)
merge(x1, y1, by = "id")[]
#> > removing key variables `id` from yvars
#> -- JOYn Report --
#>
#> report n percent
#> 1: x 2 33.3%
#> 2: x & y 3 50%
#> 3: y 1 16.7%
#> 4: total 6 100%
#> ---------------------------------------------------------- End of JOYn report --
#> id t x y report
#> 1: 1 1 11 11 x & y
#> 2: 1 2 12 11 x & y
#> 3: 2 1 13 15 x & y
#> 4: 3 2 14 NA x
#> 5: 4 NA NA 16 y
#> 6: NA NA 15 NA x
```

If the key variables have different names in both tables, you can create the equivalency as an element of `by`

vector.

```
# joining by id1 in x and id in y
merge(x4, y4, by = c("id1 = id"))
#> > removing key variables `keyby1` from yvars
#> i variables `id2` and `x` in table y are ignored because arguments `update_NAs`
#> and `update_values` are FALSE.
#>
#> -- JOYn Report --
#>
#> report n percent
#> 1: x & y 5 71.4%
#> 2: y 2 28.6%
#> 3: total 7 100%
#> ---------------------------------------------------------- End of JOYn report --
#> id1 id2 t x y report
#> 1: 1 1 1 16 11 x & y
#> 2: 1 1 2 12 11 x & y
#> 3: 2 2 1 NA 15 x & y
#> 4: 3 3 2 NA 10 x & y
#> 5: 3 4 NA 15 10 x & y
#> 6: 5 NA NA NA 20 y
#> 7: 6 NA NA NA 13 y
# joining by id1 in x and id in y, and id2 in both
merge(x4, y4, by = c("id1 = id", "id2"))
#> > removing key variables `keyby1` and `id2` from yvars
#> i variable `x` in table y is ignored because arguments `update_NAs` and
#> `update_values` are FALSE.
#>
#> -- JOYn Report --
#>
#> report n percent
#> 1: x 2 25%
#> 2: x & y 3 37.5%
#> 3: y 3 37.5%
#> 4: total 8 100%
#> ---------------------------------------------------------- End of JOYn report --
#> id1 id2 t x y report
#> 1: 1 1 1 16 11 x & y
#> 2: 1 1 2 12 11 x & y
#> 3: 2 2 1 NA NA x
#> 4: 2 1 NA NA 15 y
#> 5: 3 3 2 NA NA x
#> 6: 3 4 NA 15 10 x & y
#> 7: 5 2 NA NA 20 y
#> 8: 6 3 NA NA 13 y
```

Match type refers to the correspondence that exists between the observations of the joining tables. Following Stata’s convention, we could have four different match types, one-to-one (1:1) , one-to-many (1:m), many-to-one (m:1), and many-to-many (m:m) joins. The default is m:m, following general R’s practice, but its use is **highly discouraged**. We recommend you always specify the match type when joining tables to ensure the output is correct. Indeed, if you don’t care about match types or you don’t think it is necessary to use them for your particular needs, you might be fine without `joyn`

.

Using the same wording of the Stata manual

**1:1**: specifies a one-to-one match merge. The variables specified in`by`

*uniquely identify*single observations in both table.**1:m and m:1**: specify*one-to-many*and*many-to-one*match merges, respectively. This means that in of the tables the observations are uniquely identify by the variables in`by,`

while in the other table many (two or more) of the observations are identify by the variables in`by`

.**m:m**refers to*many-to-many joins*. variables in`by`

does not uniquely identify the observations in either table. Matching is performed by combining observations with equal values in`by`

; within matching values, the first observation in the master (i.e. left or x) table is matched with the first matching observation in the using (i.e. right or y) table; the second, with the second; and so on. If there is an unequal number of observations within a group, then the last observation of the shorter group is used repeatedly to match with subsequent observations of the longer group.

Suppose you think your data is uniquely identified by variable `id`

, it is not . If you used `match_type = "1:1"`

you will get and error, letting you know that something is not right.

```
# Merging correctly but getting error because something is not right in the data
merge(x3, y3, by = "id", match_type = "1:1")
#> > table y is not uniquely identified by `id`
#> Error: match type inconsistency
#> i you could use `return_report = TRUE` in `joyn::is_id()`
#> to see where the problem is
# Merging wrongly but getting NO errors because you did not use match_type
merge(x3, y3, by = "id")
#> > removing key variables `id` from yvars
#>
#> -- JOYn Report --
#>
#> report n percent
#> 1: x 1 20%
#> 2: x & y 3 60%
#> 3: y 1 20%
#> 4: total 5 100%
#> ---------------------------------------------------------- End of JOYn report --
#> id v foo y report
#> 1: a NA NA 20 y
#> 2: b 9 2 15 x & y
#> 3: c 8 4 11 x & y
#> 4: c 8 4 18 x & y
#> 5: d 10 7 NA x
```

Join type refers to observations that are kept after the join. To avoid confusion with argument `match_type`

, I decided not to name this argument as `join_type`

but as `keep`

. At the end, you want to specify which observations you want to keep. This argument plays the role of mimicking the behavior of `dplyr`

’s functions `left_join`

, `right_join`

, `inner_join`

, and `full_join`

, the default.

`keep = "full"`

, the default, keeps all the observations in x and y, regardless of whether they match or not.`# Full merge merge(x1, y1) #> > removing key variables `id` from yvars #> #> -- JOYn Report -- #> #> report n percent #> 1: x 2 33.3% #> 2: x & y 3 50% #> 3: y 1 16.7% #> 4: total 6 100% #> ---------------------------------------------------------- End of JOYn report -- #> id t x y report #> 1: 1 1 11 11 x & y #> 2: 1 2 12 11 x & y #> 3: 2 1 13 15 x & y #> 4: 3 2 14 NA x #> 5: 4 NA NA 16 y #> 6: NA NA 15 NA x`

`keep = "left"`

or`keep = "master"`

keeps all observations in x, regardless of whether they match or not with y.`# keep obs in x merge(x1, y1, keep = "left") #> > removing key variables `id` from yvars #> #> -- JOYn Report -- #> #> report n percent #> 1: x 2 40% #> 2: x & y 3 60% #> 3: total 5 100% #> ---------------------------------------------------------- End of JOYn report -- #> id t x y report #> 1: 1 1 11 11 x & y #> 2: 1 2 12 11 x & y #> 3: 2 1 13 15 x & y #> 4: 3 2 14 NA x #> 5: NA NA 15 NA x`

`keep = "right"`

or`keep = "using"`

keeps all observations in y, regardless of whether they match or not with x.`# keep obs in y merge(x1, y1, keep = "right") #> > removing key variables `id` from yvars #> #> -- JOYn Report -- #> #> report n percent #> 1: x & y 3 75% #> 2: y 1 25% #> 3: total 4 100% #> ---------------------------------------------------------- End of JOYn report -- #> id t x y report #> 1: 1 1 11 11 x & y #> 2: 1 2 12 11 x & y #> 3: 2 1 13 15 x & y #> 4: 4 NA NA 16 y`

`keep = "inner"`

keeps only the observations that match in both tables.`# keep matching obs in both tables merge(x1, y1, keep = "inner") #> > removing key variables `id` from yvars #> #> -- JOYn Report -- #> #> report n percent #> 1: x & y 3 100% #> 2: total 3 100% #> ---------------------------------------------------------- End of JOYn report -- #> id t x y report #> 1: 1 1 11 11 x & y #> 2: 1 2 12 11 x & y #> 3: 2 1 13 15 x & y`