**An S-Curve Method for Estimating Abrupt and Gradual
Changepoints**

Reading through the first example (“Nile River Data”) is sufficient for a quick start.

The code here implements the method of Jiang *et al* (2023),
whch does changepoint detection/estimation for changes in mean,
performed by using an S-curve (logistic function) to approximate a step
function. This enables asymptotic standard errors, and associated
confidence intervals and tests for changepoint locations and change
magnitudes. Both abrupt and gradual changes may be modeled.

Also, changes in slope and intercept in piecewise linear models can be analyzed, again with the ability to form confidence intervals for the various quantities of interest..

The **Nile** dataset, built-in to R, consists of yearly
measurements of the height of the river during 1871-1970. Here is the
code:

```
nile <- data.frame(t=1871:1970, ht=Nile)
nileOut <- fitS(nile,1,2,10) # abrupt change model
nileOut
# [1] "point estimates of the alpha_i"
# postMean preMean changePt
# 849.9696 1097.9298 1898.3814
# [1] "covariance matrix"
# postMean preMean changePt
# postMean 228.8062041 0.7764724 -0.5270315
# preMean 0.7764724 609.8103449 -11.2918541
# changePt -0.5270315 -11.2918541 6.1601174
# [1] "standard error of the difference between pre-changepoint and post-changepoint means"
# [1] 28.93205
```

Our model concluded that there is an abrupt changepoint in the middle of year 1898. This matches the fact that the British started the construction of the Aswan Low Dam in that year. We modeled an abrupt change here, by setting the S-curve slope to a large value, 10.

Here domain expertise would identify the location of the changepoint, so the main interest is the value of the change. A major drop of water flow is detected around this point, from 1097.75 down to 849.972. An approximate 95% confidence interval for the change in mean would then be

849.972 - 1097.75 ± 1.96 x 28.93205

We can also plot the fit against the raw data:

We next consider applying the S-Curve approach to data collected in on the rates of breast cancer among women in Sweden. There had been speculation that such rates rise with the onset of menopause; see Pawitan (2005; we are grateful to Prof. Pawitan for making his data available.

While that paper considers an abrupt model, the relationship, if one exists, may be gradual. For a given woman the transition to menopause is gradual. And even if it were abrupt, different women experience menopause at different ages, so that the data would follow a mixture of abrupt changes, hence gradual overall. Thus this is a good example use case for our S-curve approach.

```
data(cancerRates)
crOut <- fitS(cancerRates,1,2)
crOut
# [1] "point estimates of the alpha_i"
# postMean preMean slope changePt
# 8.965952 3.425623 1.178302 43.262817
# [1] "covariance matrix"
# postMean preMean slope changePt
# postMean 0.16734502 -0.1215118 -0.1318708 0.02302526
# preMean -0.12151178 1.1340273 0.5080011 0.44335717
# slope -0.13187077 0.5080011 0.4107214 0.16400967
# changePt 0.02302526 0.4433572 0.1640097 0.29987282
# [1] "standard error of the difference between pre-changepoint and post-changepoint means"
# [1] 1.242737
```

Here we did not specify an S-curve slope, asking
**fitS** to estimate it for us. This models a gradual
change.

We can plot the output:

This dataset came from a study that investigated the impact of Medicare, the US medical insurance program for retired people. One nomimally qualifies at age 65, though this can occur earlier or later. Here we consider 90-day mortality in relation to age.

```
# get Medicare data of David Card et al
con <- url('https://github.com/108michael/ms_thesis/raw/master/medicare.Rdata')
load(con)
z <- fitS_linear(medicare[,c(4,3)],1,2)
summary(z)
# Formula: y ~ big_linear_guy(b1, h1, s1, c, b2, h2, s2, x = x)
#
# Parameters:
# Estimate Std. Error t value Pr(>|t|)
# b1 0.60490 0.07135 8.478 9.87e-14 ***
# h1 0.78810 0.18587 4.240 4.59e-05 ***
# s1 9.59339 717.41168 0.013 0.98935
# c 65.99895 0.10660 619.109 < 2e-16 ***
# b2 -23.93643 4.40366 -5.436 3.19e-07 ***
# h2 -37.23297 12.69091 -2.934 0.00406 **
# s2 2.89020 1.76829 1.634 0.10495
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.6417 on 113 degrees of freedom
#
# Number of iterations to convergence: 91
# Achieved convergence tolerance: 1.49e-08
```

As we would hope, mortality did decline after people became eligible for Medicare. For instance, the intercept went down from -23.9364322 to -37.2329709. Interestingly, the changepoint is close to 66, indicating that many people opted to start the program a little later.

An approximate 95% confidence interval for the location of the changepoint is 65.99895 ± 1.96 x 0.10660.

There are many R packages for determining changepoints. We will
mentipn two here for comparison to **changeS**, the
**changepoints** and **mcp** packages.

Types of changes

**changeS**package is the only one of the three to handle both abrupt and gradual changes;the other two model only the abrupt case.

Changepoint location domain

**changepoints**assumes that the location of a changepoint is an integer**changeS**and**mcp**assume the location is a general continuous number

Statistical basis

**changepoints**offers several different methods, but the general theme is to conduct a series of statistical hypothesis tests at various candiate integer locations**changeS**uses nonlinear least-squares estimation**mcp**takes the Bayesian estimation philosophy

Uncertainty analysis: changepoint locations and magnitudes

**changepoints**does not form confidence intervals for these quantities**changeS**offers confidence intervals**mcp**offers Bayesian credible intervals

An S-Curve Method for Abrupt and Gradual Changepoint Analysis, Lan Jiang, Collin Kennedy, Norman Matloff; SDSS 2023

*Encyclopedia of Biostatistics*, Change-point Problem Yudi
Pawitan, 2005, https://doi.org/10.1002/0470011815.b2a12011