We define and characterize the “chained” Kullback-Leibler divergence min<sub>w</sub> D(p||w) + D(w||q) minimized over all intermediate distributions w and the analogous k-fold chained K-L divergence min D(p||w<sub>k -1</sub>) + ... + D(w<sub>2</sub>||w<sub>1</sub>) + D(w<sub>1</sub>||q) minimized over the entire path (w<sub>1</sub>, ... w<sub>k -1</sub>). This quantity arises in a large deviations analysis of a Markov chain on the set of types - the Wright-Fisher model of neutral genetic drift: a population with allele distribution q produces offspring with allele distribution w, which then produce offspring with allele distribution p, and so on. The chained divergences enjoy some of the same properties as the K-L divergence (like joint convexity in the arguments) and appear in k-step versions of some of the same settings as the K-L divergence (like information projections and a conditional limit theorem). We further characterize the optimal k-step "path" of distributions appearing in the definition and apply our findings in a large deviations analysis of the Wright-Fisher process. We make a connection to information geometry via the previously studied continuum limit, where the number of steps tends to infinity, and the limiting path is a geodesic in the Fisher information metric. Finally, we offer a thermodynamic interpretation of the chained divergence (as the rate of operation of an appropriately defined Maxwell's demon) and we state some natural extensions and applications (a k-step mutual information and k-step maximum likelihood inference). We release code for computing the objects we study.
Download Full PDF Version (Non-Commercial Use)