A number of R packages exist for computing distances between pairs of
persistence diagrams, including **TDA** (Fasy et al. 2021), **rgudhi**
(Stamm 2023) and
**TDApplied**. Comparing the speed of these calculations
was performed in the “Benchmarking and Speed” package vignette, but here
we treat the more basic question of “are these distance calculations the
same across packages?” Through examples we show that the answer is
unfortunately no, but through exploration we attempt to reconcile these
differences and provide guidelines for using the different packages.
Moreover, we include a proof of algorithm correctness for
**TDApplied**’s distance function and in the following
section we describe why we do not compare **TDApplied**’s
distance function against that of the **TDAstats** package
(Wadhwa et al. 2019).

`phom.dist`

FunctionThe **TDAstats** package has a (wasserstein) distance
function, `phom.dist`

, which is described in its package
documentation as being “not meaningful without a null distribution”. But
what does this mean? We examined the source code for
**TDAstats**, i.e. its R/inference.R file, and found that
the internal function `wass_workhorse`

on lines 77-103 is the
actual distance calculation. In this function, the persistence values
(i.e. death - birth) for topological features of two diagrams (and their
diagonal projections appended to the opposite diagram) are ordered from
largest to smallest. The persistence values are then paired between the
two diagrams according to their orderings, and the absolute differences
of these ordered persistence values are exponentiated and summed. This
algorithm is not guaranteed to produce an optimal matching of the
topological features (in the sense of the usual wasserstein cost
function), and is missing the final exponentiation (in the case of the
2-wasserstein metric we take the square root of the sum of squared
distances). These differences to the standard wasserstein formulation
were clear to the authors of **TDAstats**, and while their
`phom.dist`

function is not exactly a wasserstein metric it
is still a comparison statistic of two persistence diagrams which can be
studied with inference techniques. Nevertheless, these differences are
the reason why in **TDApplied** documentation we refer to
**TDAstats**’ distance calculation as being “non-standard”
and why comparisons (of calculations and benchmarking) are not made
against it.

In order to compare the distance calculations of
**TDA**, **rgudhi** and
**TDApplied**, we will specify three simple diagrams (the
same D1, D2 and D3 from the package vignette “TDApplied Theory and
Practice”) and consider distances between each of the three possible
pairs. The three diagrams are as follows.

D1’s point is \((2,3)\), D2’s points are \(\{(2,3.3),(0,0.5)\}\), and D3’s point is \((0,0.5)\).

In order to compare the distance calculations of the packages we need to have the correct values of the distances. Using the infinity-norm distance for calculating matchings in the bottleneck and wasserstein distances as in (Kerber, Morozov, and Nigmetov 2017; Edelsbrunner and Harer 2010), we get the following optimal matchings and distance values:

- Between D1 and D2 we match D1’s \((2,3)\) with D2’s \((2,3.3)\), and D2’s \((0,0.5)\) with its diagonal projection \((0.25,0.25)\). Therefore, the bottleneck distance is 0.3 and the wasserstein distance is \(\sqrt{0.3^2+0.25^2}=\sqrt{0.1525}\approx 0.3905125\).
- Between D1 and D3 we match D1’s \((2,3)\) with its diagonal projection \((2.5,2.5)\), and D3’s \((0,0.5)\) with its diagonal projection \((0.25,0.25)\). Therefore, the bottleneck distance is 0.5 and the wasserstein distance is \(\sqrt{0.5^2+0.25^2}=\sqrt{0.3125}\approx 0.559017\).
- Between D2 and D3 we match D2’s \((0,0.5)\) with D3’s \((0,0.5)\), and D2’s \((2,3.3)\) with its diagonal projection \((2.65,2.65)\). Therefore, the bottleneck distance is 0.65 and the wasserstein distance is \(\sqrt{0.65^2}=0.65\).

For the Fisher information metric (Le and Yamada 2018) we will use the parameter \(\sigma = 1\) for simplicity. We must first calculate vectors \(\rho_1\) and \(\rho_2\), normalize them by dividing by the sum of their respective elements, then compute \(\mbox{cos}^{-1}(\sqrt{\rho_1} \cdot \sqrt{\rho_2})\) (where \(\mbox{cos}^{-1}\) is the arccos function). See (Le and Yamada 2018) for computational details.

- Between D1 and D2 we augment D1 to contain D2’s projection points and vice versa, obtaining diagrams \(D'_1 = \{(2,3),(2.65,2.65),(0.25,0.25)\}\) and \(D'_2 = \{(2,3.3),(0,0.5),(2.5,2.5)\}\). We compute \(\rho_1\) and \(\rho_2\) as

\[\begin{align} \rho_1 = & \{\mbox{exp}(0) + \mbox{exp}(-0.545/2) + \mbox{exp}(-10.625/2),\\ & \mbox{exp}(-0.545/2) + \mbox{exp}(0) + \mbox{exp}(-11.52/2),\\ & \mbox{exp}(-10.625/2) + \mbox{exp}(-11.52/2) + \mbox{exp}(0),\\ & \mbox{exp}(-0.09/2) + \mbox{exp}(-0.845/2) + \mbox{exp}(-12.365/2),\\ & \mbox{exp}(-10.25/2) + \mbox{exp}(-11.645/2) + \mbox{exp}(-0.125/2),\\ & \mbox{exp}(-0.5/2) + \mbox{exp}(-0.045/2) + \mbox{exp}(-10.125/2)\}/(2\pi)\end{align}\]

\[\begin{align} \rho_2 = & \{\mbox{exp}(-0.09/2) + \mbox{exp}(-10.25/2) + \mbox{exp}(-0.5/2),\\ & \mbox{exp}(-0.845/2) + \mbox{exp}(-11.645/2) + \mbox{exp}(-0.045/2),\\ & \mbox{exp}(-12.365/2) + \mbox{exp}(-0.125/2) + \mbox{exp}(-10.125/2),\\ & \mbox{exp}(0) + \mbox{exp}(-11.84/2) + \mbox{exp}(-0.89/2),\\ & \mbox{exp}(-11.84/2) + \mbox{exp}(0) + \mbox{exp}(-10.25/2),\\ & \mbox{exp}(-0.89/2) + \mbox{exp}(-10.25/2) + \mbox{exp}(0)\}/(2\pi)\end{align}\]

Therefore the arccos of the dot product of the square root of the sum-normalized vectors is approximately 0.02354624.

- Between D1 and D3 we augment D1 to contain D3’s projection point and vice versa, obtaining diagrams \(D'_1 = \{(2,3),(0.25,0.25)\}\) and \(D'_3 = \{(0,0.5),(2.5,2.5)\}\). We compute \(\rho_1\) and \(\rho_2\) as

\[\begin{align} \rho_1 = & \{\mbox{exp}(0) + \mbox{exp}(-10.625/2),\\ & \mbox{exp}(-10.625/2) + \mbox{exp}(0),\\ & \mbox{exp}(-10.25/2) + \mbox{exp}(-0.125/2),\\ & \mbox{exp}(-0.5/2) + \mbox{exp}(-10.125/2)\}/(2\pi)\end{align}\]

\[\begin{align} \rho_2 = & \{\mbox{exp}(-10.25/2) + \mbox{exp}(-0.5/2),\\ & \mbox{exp}(-0.125/2) + \mbox{exp}(-10.125/2),\\ & \mbox{exp}(0) + \mbox{exp}(-10.25/2),\\ & \mbox{exp}(-10.25/2) + \mbox{exp}(0)\}/(2\pi)\end{align}\]

Therefore the arccos of the dot product of the square root of the sum-normalized vectors is approximately 0.08821907.

- Between D2 and D3 we augment D2 to contain D3’s projection point and vice versa, obtaining diagrams \(D'_2 = \{(2,3.3),(0,0.5),(0.25,0.25)\}\) and \(D'_3 = \{(0,0.5),(2.65,2.65),(0.25,0.25)\}\). We compute \(\rho_1\) and \(\rho_2\) as

\[\begin{align} \rho_1 = & \{\mbox{exp}(0) + \mbox{exp}(-11.84/2) + \mbox{exp}(-12.365/2),\\ & \mbox{exp}(-11.84/2) + \mbox{exp}(0) + \mbox{exp}(-0.125/2),\\ & \mbox{exp}(-12.365/2) + \mbox{exp}(-0.125/2) + \mbox{exp}(0),\\ & \mbox{exp}(-11.84/2) + \mbox{exp}(0) + \mbox{exp}(-0.125/2),\\ & \mbox{exp}(-0.845/2) + \mbox{exp}(-11.645/2) + \mbox{exp}(-11.52/2),\\ & \mbox{exp}(-12.365/2) + \mbox{exp}(-0.125/2) + \mbox{exp}(0)\}/\sqrt{2\pi}\end{align}\]

\[\begin{align} \rho_2 = & \{\mbox{exp}(-11.84/2) + \mbox{exp}(-0.845/2) + \mbox{exp}(-12.365/2),\\ & \mbox{exp}(0) + \mbox{exp}(-11.645/2) + \mbox{exp}(-0.125/2),\\ & \mbox{exp}(-0.125/2) + \mbox{exp}(-11.52/2) + \mbox{exp}(0),\\ & \mbox{exp}(0) + \mbox{exp}(-11.645/2) + \mbox{exp}(-0.125/2),\\ & \mbox{exp}(-11.645/2) + \mbox{exp}(0) + \mbox{exp}(-11.52/2),\\ & \mbox{exp}(-0.125/2) + \mbox{exp}(-11.52/2) + \mbox{exp}(0)\}/\sqrt{2\pi}\end{align}\]

Therefore the arccos of the dot product of the square root of the sum-normalized vectors is approximately 0.08741134.

While all three of **TDA**, **rgudhi** and
**TDApplied** provide functions for the bottleneck and
wasserstein calculations, only **TDApplied** and
**rgudhi** have the functionality to calculate the Fisher
information metric. We calculated the results of each distance
calculation, for each pair of \(D_i\)
and \(D_j\) (\(i \neq j\)) and each package, and stored
the results in tables according to the three distance metrics under
comparison:

Bottleneck comparison:

```
## pair ground_truth TDApplied TDA rgudhi
## 1 D1 and D2 0.30 0.30 0.30 0.30
## 2 D1 and D3 0.50 0.50 0.50 0.50
## 3 D2 and D3 0.65 0.65 0.65 0.65
```

Wasserstein comparison:

```
## pair ground_truth TDApplied TDA rgudhi
## 1 D1 and D2 0.3905125 0.3905125 0.55 0.55
## 2 D1 and D3 0.5590170 0.5590170 0.75 0.75
## 3 D2 and D3 0.6500000 0.6500000 0.65 0.65
```

Fisher information metric comparison:

```
## pair ground_truth TDApplied rgudhi
## 1 D1 and D2 0.02354624 0.02354624 0.02354624
## 2 D1 and D3 0.08821907 0.08821907 0.08821907
## 3 D2 and D3 0.08741134 0.08741134 0.08741134
```

All three packages agreed on the value of the three bottleneck
calculations, and both **TDApplied** and
**rgudhi** agreed on all Fisher information metric
calculations. However, while **TDA** and
**TDAstats** agreed on the three wasserstein calculations,
two of these differed from **TDApplied**’s output. This
occurred because **TDA** and **rgudhi** use a
slightly different formula for computing wasserstein distances – where
the distance between matched pairs of persistence diagram points is
Euclidean rather than an infinity-norm distance. This is a perfectly
suitable distance metric and matches the formula in (Robinson and Turner 2017). However, it is
different from the published formulas in important works like (Kerber, Morozov, and Nigmetov 2017) and (Edelsbrunner and Harer 2010) (which are the
formulas that **TDApplied** implements).

We state a quick last note of comparison between the kernel
calculations in **TDApplied** and **rgudhi**.
Even though their Fisher information metric calculations appear to be
the same (perhaps up to small differences in algorithm precision), it
turns out that **rgudhi** and **TDApplied**
return drastically different kernel values. For example, the Fisher
information metric between D2 and D3 was (correctly) stated as
0.08741134 for both packages. It follows that when \(t = 2\) the persistence Fisher kernel value
should be \(\mbox{exp}(-2*0.08741134) \approx
0.8396059\), which is the exact value of the
**TDApplied** calculation
`diagram_kernel(D2,D3,dim = 0,t = 2)`

. However, the code

```
gudhi_kern <- rgudhi::PersistenceFisherKernel$new(bandwidth = 0.5)
gudhi_kern$apply(D2[,2:3],D3[,2:3])
```

returns the value 0.7550683 (note that the `bandwidth`

parameter is \(1/t\)). Even more
perplexing is that the following code

```
gudhi_kern <- rgudhi::PersistenceFisherKernel$new(bandwidth_fisher = 0.5)
gudhi_kern$apply(D2[,2:3],D3[,2:3])
```

returns the value 1.8326, which should not be possible for the
function \(\mbox{exp}(-t*d_{FIM})\) as
\(t\) and \(d_{FIM}\) are always positive and
non-negative respectively (so the maximum value should be 1).
Unfortunately we were not able to identify the source of this confusion
by examining the source code of **rgudhi** (and GUDHI), but
it is still possible to calculate correct kernel values as follows:

```
d <- rgudhi::PersistenceFisherDistance$new() # sigma = 1
t <- 2 # or whatever desired parameter
exp(-t*d$apply(D2[,2:3],D3[,2:3]))
```

Since distance calculations are much faster with
**rgudhi** than with **TDApplied** (see the
package vignette “Benchmarking and Speedups”), and because distance and
Gram matrices can be precomputed and reused across multiple analyses
(again see “Benchmarking and Speedups”) it would be desirable to have a
(correct) **rgudhi** Gram matrix function. An example of
such a function is the following:

```
# create rgudhi distance object
# sigma = 0.01
gudhi_dist <- rgudhi::PersistenceFisherDistance$new(bandwidth = 0.01,n_jobs = 1L)
# create list of diagrams, only birth and death values in dimension 1
g <- lapply(X = 1:10,FUN = function(X){
df <- diagram_to_df(TDA::ripsDiag(X = TDA::circleUnif(n = 50),
maxdimension = 1,maxscale = 2))
return(df[which(df[,1] == 1),2:3])
})
# distance matrix function
# diagrams is a list of diagrams (only birth and death columns)
# gudhi_dist is the rgudhi distance object
gudhi_distance_matrix <- function(diagrams,gudhi_dist){
# get number of rows of each diagram since rgudhi can't calculate
# distances with empty diagrams
rows <- unlist(lapply(diagrams,FUN = nrow))
inds <- which(rows > 0)
# if inds is empty then return 0 matrix
if(length(inds) == 0)
{
return(matrix(data = 0,nrow = length(diagrams),ncol = length(diagrams)))
}
# calculate distance matrix for non-zero-row diagrams
d_non_zero <- gudhi_dist$fit_transform(diagrams[inds])
# fix diagonal which can sometimes have non-zero entries
diag(d_non_zero) <- rep(0,nrow(d_non_zero))
# symmetrize (necessary due to numeric rounding issues)
d_non_zero[which(upper.tri(d_non_zero),arr.ind = T)[,c("col","row")]] <-
d_non_zero[upper.tri(d_non_zero)]
# if all diagrams had at least one row, return
if(length(inds) == length(diagrams))
{
return(d_non_zero)
}
# create empty distance matrix d
d <- matrix(data = 0,nrow = length(diagrams),ncol = length(diagrams))
# update entries of d
e <- as.matrix(expand.grid(inds,inds))
e <- e[which(e[,1] < e[,2]),]
if(!is.matrix(e))
{
e <- t(as.matrix(e))
}
d[e] <- d_non_zero[which(upper.tri(d_non_zero),arr.ind = T)]
e <- e[,2:1]
if(!is.matrix(e))
{
e <- t(as.matrix(e))
}
d[e] <- d_non_zero[which(upper.tri(d_non_zero),arr.ind = T)]
return(d)
}
# Gram matrix function
# diagrams is a list of diagrams (only birth and death columns)
# t is the t parameter like in diagram_kernel
# gudhi_dist is the rgudhi distance object
gudhi_gram_matrix <- function(diagrams,t,gudhi_dist){
# calculate distance matrix
D <- gudhi_distance_matrix(diagrams = diagrams,gudhi_dist = gudhi_dist)
return(exp(-t*D))
}
# calculate the Gram matrix
G <- gudhi_gram_matrix(diagrams = g,t = 1,gudhi_dist = gudhi_dist)
```

Another issue with **rgudhi** calculations can be
reproduced as follows:

```
# create data frame
D = data.frame(dimension = c(0,0),birth = c(0,0),death = c(1.089866,1.098640))
# create rgudhi distance object
gudhi_dist <- rgudhi::PersistenceFisherDistance$new(bandwidth = 0.01,n_jobs = 1L)
# compute distance
gudhi_dist$apply(D[,2:3],D[,2:3])
```

`## [1] 0.00000001490116`

This calculation returns a non-zero number for the distance value of
D with itself. This is likely due to numerical rounding issues, and its
discovery in **rgudhi** has led to an update in
**TDApplied**’s `diagram_distance`

function
which now returns 0 for this calculation:

`diagram_distance(D,D,distance = "fisher",sigma = 0.001)`

`## [1] 0`

Even though the Hungarian algorithm can be used to solve the linear sum assignment problem (LSAP) (Hornik 2005), finding a minimal cost matching of two sets of points, some work needs to be done to properly apply the algorithm to calculate wasserstein or bottleneck distances. For an example we will consider the bottleneck distance, although the argument still holds with a simple change for the wasserstein distance (squaring matrix entries). Let Diag1 and Diag2 be two diagrams, with \(n_1\) and \(n_2\) points respectively, whose projections onto the diagonal are denoted by \(\pi(\mbox{Diag1})\) and \(\pi(\mbox{Diag2})\) respectively. Then take \(M\) to be the following \((n_1 + n_2) \times (n_1 + n_2)\) matrix:

\[M = \left[ \begin{array}{c|c} d_{\infty}(\mbox{Diag1},\mbox{Diag2}) & d_{\infty}(\mbox{Diag1},\pi(\mbox{Diag2})) \\ \hline d_{\infty}(\pi(\mbox{Diag1}),\mbox{Diag2}) & 0 \end{array} \right]\]

Each row corresponds to the \(n_1\) points in Diag1 followed by the \(n_2\) projections \(\pi(\mbox{Diag2})\), and vice versa for the columns. Then we claim that the solution of the LSAP problem on \(M\) has the same cost as the bottleneck distance value between Diag1 and Diag2.

Firstly, we claim that a solution to the LSAP problem on \(M\) has a cost which is no less than the distance value. Let the distance value be \(s\). Now suppose, to reach a contradiction, that there existed a lower-cost matching for the LSAP problem for \(M\), \(m\),of cost \(s' < s\). Since projection points are matched together with cost 0 in \(M\), let \(m'\) contain all the matches in \(m\) which are not between two projection points. Then \(m'\) would be matching for the distance calculation which has lower cost than \(s\), contradicting the minimality of \(s\). Therefore, a solution to the LSAP problem on \(M\) has a cost which is no less than the distance value.

Next, we claim that a solution to the LSAP problem on \(M\) has a cost which is no greater than the distance value. Now suppose, to reach a contradiction, that the solution to the LSAP on problem \(M\) had cost \(s\), which was larger than the real distance value, \(s'\). Let \(m'\) be a matching for the distance calculation of \(s'\). Then since each point in either diagram is either paired with a point in the other diagram or its own diagonal projection, there must be an equal number of unpaired points in both diagrams in \(m'\). Therefore, we can augment \(m'\) to a matching \(m\) on \(M\) in which the unpaired diagonal points are arbitrarily paired up with cost 0. Thus, \(m\) has cost \(s' < s\), contradicting the minimality of \(s\). Therefore, a solution to the LSAP problem on \(M\) has a cost which is no greater than the distance value.

Therefore, a solution to the LSAP problem for \(M\) has a cost which is both greater than and less than the bottleneck distance value, and hence the two values must be equal.

Edelsbrunner, Herbert, and John Harer. 2010. *Computational Topology:
An Introduction*. https://doi.org/10.1007/978-3-540-33259-6_7.

Fasy, Brittany T., Jisu Kim, Fabrizio Lecci, Clement Maria, David L.
Millman, and Vincent Rouvreau. 2021. *TDA: Statistical Tools for
Topological Data Analysis*. https://CRAN.R-project.org/package=TDA.

Hornik, Kurt. 2005. “A CLUE for CLUster
Ensembles.” *Journal of Statistical Software* 14
(12). https://doi.org/10.18637/jss.v014.i12.

Kerber, Michael, Dmitriy Morozov, and Arnur Nigmetov. 2017.
“Geometry Helps to Compare Persistence Diagrams.” *ACM
Journal of Experimental Algorithmics* 22 (September). https://doi.org/10.1145/3064175.

Le, Tam, and Makoto Yamada. 2018. “Persistence Fisher Kernel: A
Riemannian Manifold Kernel for Persistence Diagrams.” In
*Advances in Neural Information Processing Systems*, edited by S.
Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R.
Garnett. Vol. 31. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2018/file/959ab9a0695c467e7caf75431a872e5c-Paper.pdf.

Robinson, Andrew, and Katharine Turner. 2017. “Hypothesis Testing
for Topological Data Analysis.” *Journal of Applied and
Computational Topology* 1.

Stamm, Aymeric. 2023. *Rgudhi: An Interface to the GUDHI Library for
Topological Data Analysis*. https://CRAN.R-project.org/package=rgudhi.

Wadhwa, Raoul, Andrew Dhawan, Drew Williamson, and Jacob Scott. 2019.
*TDAstats: Pipeline for Topological Data Analysis*. https://github.com/rrrlw/TDAstats.