Efficient active learning of sparse halfspaces
We study the problem of PAC active learning of homogeneous linear
classifiers (halfspaces) in R^d, where the goal is to learn a
halfspace with low error using as few label queries as possible. Under
the extra assumption that there is a t-sparse halfspace that performs
well on the data (t << d), we would like our active learning algorithm
to be attribute efficient, i.e. to have label requirements sublinear
in d. In this talk, we present a computationally efficient algorithm
that achieves this goal. Under certain distributional assumptions on
the data, our algorithm achieves a label requirement of O(t polylog(d,
1/epsilon)). In contrast, existing algorithms in this setting are
either computationally inefficient, or subject to label requirements
polynomial in d or 1/epsilon.
(The talk is based on a paper that appeared in COLT 2018.)