What is the Fisher-Rao distance?

Author
Affiliation

Baruch College

Published

February 5, 2025

Introduction

How can we measure the difference between two probability distributions in a principled way? One approach is through the Fisher-Rao distance, a geometric measure of dissimilarity based on the Fisher information matrix (FIM).

To understand this distance, we first examine the role of the FIM in statistical inference and its interpretation as a Riemannian metric. This metric structure naturally leads to the Fisher-Rao distance, which measures the shortest path—or geodesic—between distributions on the statistical manifold. However, directly computing this distance is often intractable.

To address this challenge, we introduce an alternative approach: the square-root transformation, which embeds probability densities into a Hilbert space. A key result follows from this transformation: the Fisher-Rao distance is exactly twice the geodesic distance between transformed densities on the unit sphere. This connection not only simplifies computations but also offers new insights into the geometry of probability distributions.

Fisher Information Matrix

In a standard mathematical statistics course, one encounters the Fisher information matrix (FIM) for a probability density function p(x|θ) with some parameter θΘRd, defined as: I(θ)=E[(θlogp(x|θ))(θlogp(x|θ))]. Alternatively, it can be expressed in terms of second derivatives: I(θ)=E[2logp(x|θ)θθ].

The Fisher information matrix plays a central role in statistical inference. In maximum likelihood estimation (MLE), it helps assess the asymptotic variance of the MLE: n(θ^nθ)dN(0,I1(θ)). The Cramér-Rao bound states that for any unbiased estimator θ^: Cov(θ^)I1(θ), where AB indicates that AB is positive semi-definite. In other words, the covariance matrix of any unbiased estimator is bounded below by the inverse of the FIM. In Bayesian statistics, the FIM is related to constructing priors. The Jeffreys prior, an invariant and non-informative prior, is given by: π(θ)det(I(θ)), which is an objective, non-information prior that is invariant under reparametrization.

Fisher-Rao Metric

A key discovery in information geometry () is that the FIM induces a Riemannian metric on the statistical manifold M={p(x|θ)}, known as the Fisher-Rao metric. Given an infinitesimal displacement dθ in the parameter space Θ, the squared length of the displacement under the Fisher-Rao metric is given by: ds2=dθI(θ)dθ. Thus, the Fisher-Rao metric captures the local geometry of the statistical manifold by taking the FIM as the local matrix form in a given coordinate system.

Fisher-Rao Distance

Since the Fisher-Rao metric defines a Riemannian structure, one can define a distance between two points in the manifold. The Fisher-Rao distance between two distributions, p1(x)=p(x|θ1) and p2(x)=p(x|θ2), is the geodesic length connecting θ1 and θ2:

(1)dFR(p1,p2):=dFR(θ1,θ2)=infγγ˙(t)I(γ(t))γ˙(t)dt, where the infimum is taken over all smooth curves γ:[0,1]Θ such that γ(0)=θ1 and γ(1)=θ2. This formulation reveals that the Fisher-Rao distance is a Riemannian geodesic distance.

Computing the Fisher-Rao Distance

While theoretically elegant, computing the Fisher-Rao distance as in Equation 1 is challenging because it requires solving the geodesic equations: d2θidt2+j,kΓjkidθjdtdθkdt=0, where Γjki are Christoffel symbols derived from the Fisher information metric: Γjki=12mIim(Imjθk+ImkθjIjkθm). Exact solutions are known to exist for only a handful of distributions ().

Square-Root Transformation and Geodesic Distance

A practical alternative for computing the Fisher-Rao distance is the square-root transformation: p(x)p(x). This procedure embeds probability densities into the Hilbert space L2 with the standard inner product. For f,gL2, the inner product is given by f,g=f(x)g(x)dx.

It is straightforward to verify that the transformed functions lie on the infinite-dimensional unit sphere S. Specifically, let fL2 be the transformed function of some density ϕ, i.e., f(x)=ϕ(x). Then, we have f2=f,f=f(x)2dx=ϕ(x)=1, where the last equality follows from the fact that ϕ(x) is a probability density function and thus integrates to 1.

On the unit sphere in L2, the geodesic curve between two points is given by the great circle connecting them. Let ψ1 and ψ2 be two elements of S. Then, the geodesic distance between them is canonically known by arccos(ψ1,ψ2).

Equivalence of Fisher-Rao Distance and Geodesic Distance of Square Root Densities

We are now ready to state the main result of this discussion. The Fisher-Rao distance between two distributions, p1(x)=p(x|θ1) and p2(x)=p(x|θ2), is twice the geodesic distance between their square-root transformed densities, ψ1(x)=p1(x) and ψ2(x)=p2(x), on S:

(2)dFR(p1,p2)=2arccos(p1(x)p2(x)dx)

To verify Equation 2, consider an infinitesimal perturbation in p(x) by introducing a small variation: (3)p(x)p(x)+ϵh(x), where h(x) is an arbitrary function that integrates to 0 and ϵ>0. This constraint ensures that p(x)+ϵh(x) remains as a valid density. Our goal is to quantify how this small variation in p(x) induces a corresponding change in ψ(x). In other words, we seek to measure the variation in the transformed function. Viewing the right-hand side of Equation 3 as a function of ϵ, we apply a first-order Taylor expansion to the square-root transformation: p+ϵhp+12ϵhp+O(ϵ2). Thus, the infinitesimal perturbation in ψ(x) is: (4)δψ(x)=p(x)+ϵh(x)p(x)=12h(x)p(x)ϵ+O(ϵ2).

From Equation 4, we observe that δψ(x) is linear in h(x), demonstrating how small changes in p(x) translate to changes in ψ(x). Recall that the Fisher-Rao metric corresponds to infinitesimal displacements in probability space. Given the embedding of densities via the square-root transformation, it is natural to measure perturbations by computing the squared norm: (5)ds2=δψ2=(12h(x)p(x))2dx=ϵ24h(x)2p(x)dx.

Since h(x) represents an infinitesimal change in p(x), we recognize that the integral in Equation 5 corresponds precisely to the Fisher information metric evaluated for an infinitesimal displacement. Consequently, we obtain ds2=14h(x)2p(x)dx. which reveals that the Fisher-Rao metric is 14 times the standard metric on S induced by the square-root transformation.

By combining these observations with the fact that geodesic distances scale inversely with the metric factor, we conclude that the geodesic distance computed in the Fisher-Rao metric is twice the standard geodesic distance on the unit sphere in L2. That is, for two densities p1 and p2 and their transformations ψ1=p1 and ψ2=p2 in L2, we have dFR(p1,p2)=2×dS(ψ1,ψ2)=2arccos(p1(x)p2(x)dx).

Final Thoughts

  1. In the derivation of the Fisher-Rao distance in terms of the scaled geodesic distance on S, no specific choice or constraint is imposed on θ. This formulation naturally extends to nonparametric probability densities, as the geodesic distance in L2 depends solely on the embedding.
  2. Despite the equivalence, some concerns remain regarding the computability of the Fisher-Rao distance. For instance, evaluating the integral can be challenging in many cases, particularly for high-dimensional distributions or arbitrary nonparametric densities, where numerical integration becomes intractable. Approximation via Monte Carlo methods, while useful, presents its own set of difficulties. Moreover, when densities are estimated, they may contain regions of extremely small values, leading to numerical precision issues.
  3. I would like to express my appreciation to Prof. Marco Radeschi for his kindness and patience in enduring my endless questions in his differential geometry course - many of which, in hindsight, seem absurd.

References

Amari, Shun’ichi, Hiroshi Nagaoka, Shun’ichi Amari, and Shun’ichi Amari. 2007. Methods of Information Geometry. Translated by Daishi Harada. Nachdruck. Translations of Mathematical Monographs 191. Providence, Rhode Island: American Mathematical Society.
Miyamoto, Henrique K., Fábio C. C. Meneghetti, Julianna Pinele, and Sueli I. R. Costa. 2024. “On Closed-Form Expressions for the Fisher-Rao Distance.” Information Geometry 7 (2): 311–54. https://doi.org/10.1007/s41884-024-00143-2.

Footnotes

  1. There is an interesting connection to the objective Bayesian framework, which will appear in a later post.↩︎

  2. From a geometric point of view, δψ(x) belongs to the tangent space of S at the point ψ(x).↩︎

Citation

BibTeX citation:
@online{you2025,
  author = {You, Kisung},
  title = {What Is the {Fisher-Rao} Distance?},
  date = {2025-02-05},
  url = {https://kisungyou.com/Blog/blog_001_FisherRao.html},
  langid = {en}
}
For attribution, please cite this work as:
You, Kisung. 2025. “What Is the Fisher-Rao Distance?” February 5, 2025. https://kisungyou.com/Blog/blog_001_FisherRao.html.