# How to use the R2ucare package to assess the fit of capture-recapture to data?

## What it does (and does not do)

Ths package contains R functions to perform goodness-of-fit tests for capture-recapture models. It also has various functions to manipulate capture-recapture data. Please email all suggestions for improvements, questions, comments and bugs to olivier.gimenez [AT] cefe.cnrs.fr.

For Cormack-Jolly-Seber models (single-state), we refer to Lebreton et al. (1992) and Pradel et al. (2005) for the theory. For Arnason-Schwarz models (multistate), have a look to Pradel et al. (2003). Chapter 5 of the Gentle Introduction to MARK also provides a good start for understanding goodness-of-fit test.

Warning: to date, no goodness-of-fit test exists for models with individual covariates (unless you discretize them and use groups), individual time-varying covariates (unless you treat them as states) or temporal covariates; therefore, remove these covariates from your dataset before using it with R2ucare. For groups, just treat the group separately as in the Dipper example below.

## To install the package

The latest stable version of the package can be downloaded from CRAN with the R command

install.packages("unmarked")

The repository on GitHub https://github.com/oliviergimenez/R2ucare hosts the development version of the package, to install it:

if(!require(devtools)) install.packages("devtools")
library("devtools")
install_github('oliviergimenez/R2ucare')

Despite what its name might suggest, you do not need to download and install U-CARE to run the R2ucare package. This package is basically a Matlab to R translation of U-CARE (Choquet et al. 2009).

## Goodness-of-fit tests for the Cormack-Jolly-Seber model

First things first, load the R2ucare package:

library(R2ucare)

Let’s begin by reading in the classical dipper using the read_inp function. Note that data with the Headed format can also be read using the function read_headed. The dataset is provided with the package (thanks to Gilbert Marzolin for sharing his data).

dipper = system.file("extdata", "ed.inp", package = "R2ucare")
dipper = read_inp(dipper,group.df=data.frame(sex=c('Male','Female')))

Get encounter histories, counts and groups:

dip.hist = dipper$encounter_histories dip.freq = dipper$sample_size
dip.group = dipper$groups Split the dataset in females/males: mask = (dip.group == 'Female') dip.fem.hist = dip.hist[mask,] dip.fem.freq = dip.freq[mask] mask = (dip.group == 'Male') dip.mal.hist = dip.hist[mask,] dip.mal.freq = dip.freq[mask] Tadaaaaan, now you’re ready to perform Test.3Sr, Test3.Sm, Test2.Ct and Test.Cl for females: test3sr_females = test3sr(dip.fem.hist, dip.fem.freq) test3sm_females = test3sm(dip.fem.hist, dip.fem.freq) # we need the m-array to perform test2ct and test2cl test2ct_females = test2ct(dip.fem.hist, dip.fem.freq) test2cl_females = test2cl(dip.fem.hist, dip.fem.freq) # display results: test3sr_females ##$test3sr
##      stat        df     p_val sign_test
##     4.985     5.000     0.418     1.428
##
## $details ## component stat p_val signed_test test_perf ## 1 2 0.858 0.354 0.926 Chi-square ## 2 3 3.586 0.058 1.894 Chi-square ## 3 4 0.437 0.509 0.661 Chi-square ## 4 5 0.103 0.748 -0.321 Chi-square ## 5 6 0.001 0.982 0.032 Chi-square test3sm_females ##$test3sm
##  stat    df p_val
## 2.041 3.000 0.564
##
## $details ## component stat df p_val test_perf ## 1 2 1.542 1 0.214 Fisher ## 2 3 0 1 1 Fisher ## 3 4 0.499 1 0.48 Fisher ## 4 5 0 0 0 None ## 5 6 0 0 0 None test2ct_females ##$test2ct
##      stat        df     p_val sign_test
##     3.250     4.000     0.517    -0.902
##
## $details ## component dof stat p_val signed_test test_perf ## 1 2 1 0 1 0 Fisher ## 2 3 1 0 1 0 Fisher ## 3 4 1 0 1 0 Fisher ## 4 5 1 3.25 0.071 -1.803 Fisher test2cl_females ##$test2cl
##  stat    df p_val
##     0     0     1
##
## $details ## component dof stat p_val test_perf ## 1 2 0 0 0 None ## 2 3 0 0 0 None ## 3 4 0 0 0 None Or perform all tests at once: overall_CJS(dip.fem.hist, dip.fem.freq) ## chi2 degree_of_freedom p_value ## Gof test for CJS model: 10.276 12 0.592 ## Goodness-of-fit tests for the Arnason-Schwarz model Read in the geese dataset. It is provided with the package (thanks to Jay Hesbeck for sharing his data). geese = system.file("extdata", "geese.inp", package = "R2ucare") geese = read_inp(geese) Get encounter histories and number of individuals with corresponding histories geese.hist = geese$encounter_histories
geese.freq = geese$sample_size And now, perform Test3.GSr, Test.3.GSm, Test3G.wbwa, TestM.ITEC and TestM.LTEC: test3Gsr(geese.hist,geese.freq) ##$test3Gsr
##    stat      df   p_val
## 117.753  12.000   0.000
##
## $details ## occasion site stat df p_val test_perf ## 1 2 1 3.894777e-03 1 9.502378e-01 Chi-square ## 2 2 2 2.715575e-04 1 9.868523e-01 Chi-square ## 3 2 3 8.129814e+00 1 4.354322e-03 Chi-square ## 4 3 1 1.139441e+01 1 7.366526e-04 Chi-square ## 5 3 2 2.707742e+00 1 9.986223e-02 Chi-square ## 6 3 3 3.345916e+01 1 7.277633e-09 Chi-square ## 7 4 1 1.060848e+01 1 1.125702e-03 Chi-square ## 8 4 2 3.533332e-01 1 5.522323e-01 Chi-square ## 9 4 3 1.016778e+01 1 1.429165e-03 Chi-square ## 10 5 1 1.101349e+01 1 9.045141e-04 Chi-square ## 11 5 2 1.292013e-01 1 7.192616e-01 Chi-square ## 12 5 3 2.978513e+01 1 4.826802e-08 Chi-square test3Gsm(geese.hist,geese.freq) ##$test3Gsm
##    stat      df   p_val
## 302.769 119.000   0.000
##
## $details ## occasion site stat df p_val test_perf ## 1 2 1 23.913378 14 4.693795e-02 Fisher ## 2 2 2 24.810007 16 7.324561e-02 Fisher ## 3 2 3 11.231939 8 1.889004e-01 Fisher ## 4 3 1 36.521484 14 8.712879e-04 Fisher ## 5 3 2 21.365358 17 2.103727e-01 Chi-square ## 6 3 3 23.072982 10 1.048037e-02 Fisher ## 7 4 1 55.338866 8 3.793525e-09 Fisher ## 8 4 2 17.172011 11 1.028895e-01 Fisher ## 9 4 3 45.089296 10 2.095523e-06 Chi-square ## 10 5 1 9.061514 3 2.848411e-02 Chi-square ## 11 5 2 5.974357 4 2.010715e-01 Chi-square ## 12 5 3 29.217786 4 7.060092e-06 Chi-square test3Gwbwa(geese.hist,geese.freq) ##$test3Gwbwa
##    stat      df   p_val
## 472.855  20.000   0.000
##
## $details ## occasion site stat df p_val test_perf ## 1 2 1 19.5914428 2 5.568936e-05 Chi-square ## 2 2 2 37.8676763 2 5.986026e-09 Chi-square ## 3 2 3 4.4873614 1 3.414634e-02 Fisher ## 4 3 1 80.5903050 1 2.777187e-19 Chi-square ## 5 3 2 98.7610833 4 1.805369e-20 Chi-square ## 6 3 3 0.8071348 1 3.689687e-01 Fisher ## 7 4 1 27.7054638 1 1.412632e-07 Chi-square ## 8 4 2 53.6936048 2 2.190695e-12 Chi-square ## 9 4 3 25.2931602 1 4.924519e-07 Chi-square ## 10 5 1 43.6547442 1 3.917264e-11 Chi-square ## 11 5 2 50.9264976 2 8.738795e-12 Chi-square ## 12 5 3 29.4763896 2 3.974507e-07 Chi-square testMitec(geese.hist,geese.freq) ##$testMitec
##   stat     df  p_val
## 68.224 27.000  0.000
##
## $details ## occasion stat df p_val test_perf ## 1 2 14.26712 9 0.1131354478 Chi-square ## 2 3 30.83835 9 0.0003155366 Chi-square ## 3 4 23.11879 9 0.0059349648 Chi-square testMltec(geese.hist,geese.freq) ##$testMltec
##   stat     df  p_val
## 20.989 19.000  0.337
##
## \$details
##   occasion      stat df     p_val  test_perf
## 1        2 14.102324 10 0.1683752 Chi-square
## 2        3  6.886312  9 0.6489547 Chi-square

Or all tests at once:

overall_JMV(geese.hist,geese.freq)
##                            chi2 degree_of_freedom p_value
## Gof test for JMV model: 982.589               197       0

## Various useful functions

Several U-CARE functions to manipulate and process capture-recapture data can be mimicked with R built-in functions. For example, recoding multisite/state encounter histories in single-site/state ones is easy:

# Assuming the geese dataset has been read in R (see above):
geese.hist[geese.hist>1] = 1

So is recoding states:

# Assuming the geese dataset has been read in R (see above):
geese.hist[geese.hist==3]=2 # all 3s become 2s

Also, reversing time is not that difficult:

# Assuming the female dipper dataset has been read in R (see above):
t(apply(dip.fem.hist,1,rev))

What about cleaning data, i.e. deleting empty histories, and histories with no individuals?

# Assuming the female dipper dataset has been read in R (see above):
mask = (apply(dip.fem.hist,1,sum)>0 & dip.fem.freq>0) # select non-empty histories, and histories with at least one individual
sum(!mask) # how many histories are to be dropped?
dip.fem.hist[mask,] # drop these histories from dataset
dip.fem.freq[mask] # from counts

Selecting or gathering occasions is as simple as that:

# Assuming the female dipper dataset has been read in R (see above):
dip.fem.hist[,c(1,4,6)] # pick occasions 1, 4 and 6 (might be a good idea to clean the resulting dataset)
gather_146 = apply(dip.fem.hist[,c(1,4,6)],1,max) # gather occasions 1, 4 and 6 by taking the max
dip.fem.hist[,1] = gather_146 # replace occasion 1 by new occasion
dip.fem.hist = dip.fem.hist[,-c(4,6)] # drop occasions 4 and 6

Finally, suppressing the first encounter is achieved as follows:

# Assuming the geese dataset has been read in R (see above):
for (i in 1:nrow(geese.hist)){
occasion_first_encounter = min(which(geese.hist[i,]!=0)) # look for occasion of first encounter
geese.hist[i,occasion_first_encounter] = 0 # replace the first non zero by a zero
}
# delete empty histories from the new dataset
mask = (apply(geese.hist,1,sum)>0) # select non-empty histories
sum(!mask) # how many histories are to be dropped?
geese.hist[mask,] # drop these histories from dataset
geese.freq[mask] # from counts

The R packages plyr, dplyr and tidyr might also be very useful to work with capture-recapture datasets.

Besides these simple manipulations, several useful U-CARE functions need a proper R equivalent. I have coded a few of them, see the list below and ?name-of-the-function for more details.

• marray build the m-array for single-site/state capture-recapture data;
• multimarray build the m-array for multi-site/state capture-recapture data;
• group_data pool together individuals with the same encounter capture-recapture history.
• ungroup_data split encounter capture-recapture histories in individual ones.