Classification of gene trees is an important task both in analysis of multi-locus phylogenetic data, and assessment of the convergence of Markov Chain Monte Carlo (MCMC) analyses used in Bayesian phylogenetic tree reconstruction. The logistic regression model is one of the most popular classification models in statistical learning, thanks to its computational speed and interpretability. However, it is not appropriate to directly apply the logistic regression model to a set of phylogenetic trees with the same set of leaf labels, as the space of phylogenetic trees is not Euclidean. It is well-known in tropical geometry and phylogenetics that the space of phylogenetic trees is a tropical linear space in terms of the max-plus algebra. In recent years, tropical geometry has found applications in statistical learning over the space of phylogenetic trees. In this talk, we propose an analogue approach of the logistic regression model in the setting of tropical geometry. In our proposed method, we consider two cases: where the numbers of the species trees are fixed as one and two, and we estimate the species tree(s) from a sample of gene trees distributed over the space of ultrametrics, which is a tropical linear space. we show that both models are statistically consistent and bounds on the generalization error of both models are derived.
Tropical Logistic Regression Model on Space of Phylogenetic Trees
Ruriko Yoshida, Naval Postgraduate School
Authors: Georgios Aliatimis, Ruriko Yoshida, Burak Boyaci, and James A. Grant
2023 AWM Research Symposium
Tropical Geometry [Organized by Josephine Yu and Abeer Al Ahmadieh]