Measuring Transferability: some recent insights
Training data is often not fully representative of the target
population due to bias in the sampling mechanism; in such situations,
we aim to ’transfer’ relevant information from the training data
(a.k.a. source data) to the target application. How much information
is in the source data? How much target data should we collect if any?
These are all practical questions that depend crucially on ‘how far’
the source domain is from the target. However, it remains generally
unclear how to properly measure ‘distance’ between source and target.
In this talk we will argue that much of the traditional notions of
‘distance’ (e.g. KL-divergence, extensions of TV such as D_A
discrepancy, and even density-ratios) can yield an over-pessimistic
picture of transferability. In fact, much of these measures are
ill-defined or too large in common situations where, intuitively,
transfer should be possible (e.g. situations with structured data of
differing dimensions, or situations where the target distribution puts
significant mass in regions of low source mass). Instead, we show that
a notion of ‘relative dimension’ between source and target (which we
simply term the ‘transfer-exponent’) captures a continuum from easy to
hard transfer. The transfer-exponent uncovers a rich set of situations
where transfer is possible even at fast rates, helps answer questions
such as the benefit of unlabeled data, and has interesting
implications for related problems such as multi-task learning.