This is a brief introduction to the package ` fdapace` (Carroll et al. 2020). For a general overview on functional data analysis (FDA) see (Wang, Chiou, and Müller 2016) and key references for the PACE approach and the associated dynamics are (Yao et al. 2003; Yao, Müller, and Wang 2005; Liu and Müller 2009; Müller and Yao 2010; Li and Hsing 2010; Zhang and Wang 2016, 2018). The basic work-flow behind the PACE approach for sparse functional data is as follows (see e.g. (Yao, Müller, and Wang 2005; Liu and Müller 2009) for more information):

- Calculate the smoothed mean \(\hat{\mu}\) (using local linear smoothing) aggregating all the available readings together.
- Calculate for each curve separately its own raw covariance and then aggregate all these raw covariances to generate the sample raw covariance.
- Use the off-diagonal elements of the sample raw covariance to estimate the smooth covariance.
- Perform eigenanalysis on the smoothed covariance to obtain the estimated eigenfunctions \(\hat{\phi}\) and eigenvalues \(\hat{\lambda}\), then project that smoothed covariance on a positive semi-definite surface (Hall, Müller, and Yao 2008).
- Use Conditional Expectation (PACE step) to estimate the corresponding scores \(\hat{\xi}\). ie. \(\hat{\xi}_{ik} = \hat{E}[\hat{\xi}_{ik}|Y_i] = \hat{\lambda}_k \hat{\phi}_{ik}^T \Sigma_{Y_i}^{-1}(Y_i-\hat{\mu}_i)\).

As a working assumption a dataset is treated as sparse if it has on average less than 20, potentially irregularly sampled, measurements per subject. A user can manually change the automatically determined ` dataType` if that is necessary. For densely observed functional data simplified procedures are available to obtain the eigencomponents and associated functional principal components scores (see eg. (Castro, Lawton, and Sylvestre 1986) for more information). In particular in this case we:

- Calculate the cross-sectional mean \(\hat{\mu}\).
- Calculate the cross-sectional covariance surface (which is guaranteed to be positive semi-definite).
- Perform eigenanalysis on the covariance to estimate the eigenfunctions \(\hat{\phi}\) and eigenvalues \(\hat{\lambda}\).
- Use numerical integration to estimate the corresponding scores \(\hat{\xi}\). ie. \(\hat{\xi}_{ik} = \int_0^T [ y(t) - \hat{\mu}(t)] \phi_i(t) dt\)

In the case of sparse FPCA the most computational intensive part is the smoothing of the sample’s raw covariance function. For this, we employ a local weighted bilinear smoother.

A sibling MATLAB package for ` fdapace` can be found here.

The simplest scenario is that one has two lists ` yList` and

`tList`

`yList`

`tList`

The generated ` FPCAobj` will contain all the basic information regarding the desired FPCA.

```
library(fdapace)
# Set the number of subjects (N) and the
# number of measurements per subjects (M)
N <- 200;
M <- 100;
set.seed(123)
# Define the continuum
s <- seq(0,10,length.out = M)
# Define the mean and 2 eigencomponents
meanFunct <- function(s) s + 10*exp(-(s-5)^2)
eigFunct1 <- function(s) +cos(2*s*pi/10) / sqrt(5)
eigFunct2 <- function(s) -sin(2*s*pi/10) / sqrt(5)
# Create FPC scores
Ksi <- matrix(rnorm(N*2), ncol=2);
Ksi <- apply(Ksi, 2, scale)
Ksi <- Ksi %*% diag(c(5,2))
# Create Y_true
yTrue <- Ksi %*% t(matrix(c(eigFunct1(s),eigFunct2(s)), ncol=2)) + t(matrix(rep(meanFunct(s),N), nrow=M))
```

```
L3 <- MakeFPCAInputs(IDs = rep(1:N, each=M), tVec=rep(s,N), t(yTrue))
FPCAdense <- FPCA(L3$Ly, L3$Lt)
# Plot the FPCA object
plot(FPCAdense)
```

`## [1] 5.050606 1.999073`

```
# Create sparse sample
# Each subject has one to five readings (median: 3)
set.seed(123)
ySparse <- Sparsify(yTrue, s, sparsity = c(1:5))
# Give your sample a bit of noise
ySparse$yNoisy <- lapply(ySparse$Ly, function(x) x + 0.5*rnorm(length(x)))
# Do FPCA on this sparse sample
# Notice that sparse FPCA will smooth the data internally (Yao et al., 2005)
# Smoothing is the main computational cost behind sparse FPCA
FPCAsparse <- FPCA(ySparse$yNoisy, ySparse$Lt, list(plot = TRUE))
```

` FPCA` calculates the bandwidth utilized by each smoother using generalised cross-validation or \(k\)-fold cross-validation automatically. Dense data are not smoothed by default. The argument

`methodMuCovEst`

`smooth`

`cross-sectional`

The bandwidth used for estimating the smoothed mean and the smoothed covariance are available under ` ...bwMu` and

`bwCov`

Visualising the fitted trajectories is a good way to see if the new bandwidth made any sense:

` FPCA` uses a Gaussian kernel when smoothing sparse functional data; other kernel types (eg. Epanechnikov/

`epan`

`?FPCA`

`$optns\$kernel`

`gauss`

`rect`

` FPCAsparseRect <- FPCA(ySparse$yNoisy, ySparse$Lt, optns = list(kernel = 'rect')) # Use rectangular kernel`

` FPCA` returns automatically the smallest number of components required to explain 99% of a sample’s variance. Using the function

`selectK`

```
## $K
## [1] 3
##
## $criterion
## [1] 0.9876603
```

```
## $K
## [1] 2
##
## $criterion
## [1] 993.4442
```

When working with functional data (usually not very sparse) the estimation of derivatives is often of interest. Using ` fitted.FPCA` one can directly obtain numerical derivatives by defining the appropriate order

`p`

`fdapace`

`p =1`

`2`

`?fitted.FPCA`

`FPCAder`

`FPCA`

```
fittedCurvesP0 <- fitted(FPCAsparse) # equivalent: fitted(FPCAsparse, derOptns=list(p = 0));
# Get first order derivatives of fitted curves, smooth using Epanechnikov kernel
fittedCurcesP1 <- fitted(FPCAsparse, derOptns=list(p = 1, kernelType = 'epan'))
```

```
## Warning in fitted.FPCA(FPCAsparse, derOptns = list(p = 1, kernelType = "epan")): Potentially you use too many components to estimate derivatives.
## Consider using SelectK() to find a more informed estimate for 'K'.
```

We use the ` medfly25` dataset that this available with

`fdapace`

`FPCA`

`medfly25`

```
# load data
data(medfly25)
# Turn the original data into a list of paired amplitude and timing lists
Flies <- MakeFPCAInputs(medfly25$ID, medfly25$Days, medfly25$nEggs)
fpcaObjFlies <- FPCA(Flies$Ly, Flies$Lt, list(plot = TRUE, methodMuCovEst = 'smooth', userBwCov = 2))
```

Based on the scree-plot we see that the first three components appear to encapsulate most of the relevant variation. The number of eigencomponents to reach a 99.99% FVE is \(11\) but just \(3\) eigencomponents are enough to reach a 95.0%. We can easily inspect the following visually, using the ` CreatePathPlot` command.

`## Loading required package: ks`

```
par(mfrow=c(1,2))
CreatePathPlot(fpcaObjFlies, subset = c(3,5,135), main = 'K = 11', pch = 4); grid()
CreatePathPlot(fpcaObjFlies, subset = c(3,5,135), K = 3, main = 'K = 3', pch = 4) ; grid()
```

One can perform outlier detection (Febrero, Galeano, and González-Manteiga 2007) as well as visualize data using a functional box-plot. To achieve these tasks one can use the functions ` CreateOutliersPlot` and

`CreateFuncBoxPlot`

`KDE`

`bagplot`

`CreateOutliersPlot`

` CreateFuncBoxPlot(fpcaObjFlies, xlab = 'Days', ylab = '# of eggs laid', optns = list(K =3, variant='bagplot'))`

Functional data lend themselves naturally to questions about their rate of change; their derivatives. As mentioned previously using ` fdapace` one can generate estimates of the sample’s derivatives (

`fitted.FPCA`

`FPCAder`

`derOptns`

Carey, JR, P Liedo, H-G Müller, J-L Wang, and J-M Chiou. 1998. “Relationship of Age Patterns of Fecundity to Mortality, Longevity, and Lifetime Reproduction in a Large Cohort of Mediterranean Fruit Fly Females.” *The Journals of Gerontology Series A: Biological Sciences and Medical Sciences* 53 (4): B245–B251.

Carroll, Cody, Alvaro Gajardo, Yaqing Chen, Xiongtao Dai, Jianing Fan, Pantelis Z. Hadjipantelis, Kyunghee Han, Hao Ji, Hans-Georg Mueller, and Jane-Ling Wang. 2020. *Fdapace: Functional Data Analysis and Empirical Dynamics*. https://github.com/functionaldata/tPACE.

Castro, PE, WH Lawton, and EA Sylvestre. 1986. “Principal Modes of Variation for Processes with Continuous Sample Curves.” *Technometrics* 28 (4): 329–37.

Febrero, M, P Galeano, and W González-Manteiga. 2007. “A Functional Analysis of Nox Levels: Location and Scale Estimation and Outlier Detection.” *Computational Statistics* 22 (3): 411–27.

Hall, P, H-G Müller, and F Yao. 2008. “Modelling Sparse Generalized Longitudinal Observations with Latent Gaussian Processes.” *Journal of the Royal Statistical Society: Series B (Statistical Methodology)* 70 (4): 703–23.

Hyndman, RJ, and HL Shang. 2010. “Rainbow Plots, Bagplots, and Boxplots for Functional Data.” *Journal of Computational and Graphical Statistics* 19 (1).

Li, Y, and T Hsing. 2010. “Uniform Convergence Rates for Nonparametric Regression and Principal Component Analysis in Functional/Longitudinal Data.” *The Annals of Statistics* 38 (6): 3321–51.

Liu, B, and H-G Müller. 2009. “Estimating Derivatives for Samples of Sparsely Observed Functions, with Application to Online Auction Dynamics.” *Journal of the American Statistical Association* 104 (486): 704–17.

Müller, H-G, and F Yao. 2010. “Empirical Dynamics for Longitudinal Data.” *The Annals of Statistics* 38 (6): 3458–86.

Rousseeuw, PJ, I Ruts, and JW Tukey. 1999. “The Bagplot: A Bivariate Boxplot.” *The American Statistician* 53 (4): 382–87.

Wang, J-L, J-M Chiou, and H-G Müller. 2016. “Functional Data Analysis.” *Annual Review of Statistics and Its Application* 3: 257–95.

Yao, F, H-G Müller, A J Clifford, S R Dueker, J Follett, Y Lin, B A Buchholz, and JS Vogel. 2003. “Shrinkage Estimation for Functional Principal Component Scores with Application to the Population Kinetics of Plasma Folate.” *Biometrics* 59 (3): 676–85.

Yao, F, H-G Müller, and J-L Wang. 2005. “Functional Data Analysis for Sparse Longitudinal Data.” *Journal of the American Statistical Association* 100 (470): 577–90.

Zhang, X, and J-L Wang. 2016. “From Sparse to Dense Functional Data and Beyond.” *The Annals of Statistics* 44 (5): 2281–2321.

Zhang, X, and J–L Wang. 2018. “Optimal Weighting Schemes for Longitudinal and Functional Data.” *Statistics & Probability Letters* 138: 165–70.