Computing Approximate Statistical Discrepancy

Published in International Symposium on Algorithms and Computation, 2018

Recommended citation: Michael Matheny and Jeff M. Phillips. Computing Approximate Statistical Discrepancy. In 29th International Symposium on Algorithms and Computation (arXiv:1804.11307), 2018.

Consider a geometric range space $(X,\mathcal{A})$ where $X$ is comprised of the union of a red set $R$ and blue set $B$. Let $\Phi(A)$ define the absolute difference between the fraction of red and fraction of blue points which fall in the range $A$. The maximum discrepancy range $A^* = \arg \max_{A \in (X,\mathcal{A})} \Phi(A)$. Our goal is to find some $\hat{A} \in (X,\mathcal{A})$ such that $\Phi(A^*) - \Phi(\hat A) \leq \varepsilon$. We develop general algorithms for this approximation problem for range spaces with bounded VC-dimension, as well as significant improvements for specific geometric range spaces defined by balls, halfspaces, and axis-aligned rectangles. This problem has direct applications in discrepancy evaluation and classification, and we also show an improved reduction to a class of problems in spatial scan statistics.

Download paper here