Pattern Theory Lunch Seminar Series

May 6, 2015

12-1pm, 182 George St Room 110

Two Recent Information Theoretic Variations on the Theme of Patterns in Security

Muriel Medard

Cecil H. Green Professor
Electrical Engineering and Computer Science Department
Massachusetts Institute of Technology

We overview two different sets of results based upon the effect of patterns in security. In the first part, we consider limits of inference, a problem that emerges when we seek to ascertain what limits to privacy we can expect when machine learning algorithms, whose theoretical basis often relies on principal inertia components, are applied to mining publicly available data that may be related, in loosely known ways, to private data. Lower bounds for the average probability of error of estimating a hidden variable X given an observation of a correlated random variable Y, and Fano’s inequality in particular, play a central role in information theory. We present a lower bound for the average estimation error based on the marginal distribution of X and the principal inertias of the joint distribution matrix of X and Y, providing thus limits to privacy. Furthermore, we investigate how to answer a fundamental question in inference and privacy: given an observation Y, can we estimate a function f(X) of the hidden random variable X with an average error below a certain threshold? We provide a general method for answering this question using an approach based on rate-distortion theory.

In the second part, we consider recent results on guesswork, the characterization of the process sequences such as passwords. We note that, what may appear as being even slight differences in distributions of these sequences may lead to differences that are exponential in guesswork, leading to possibly surprising results, such as the failure of the oft-assumed uniformity of compressed sources, and the fact that friendly jamming of an intended user may be advantageous. We conclude with our recently defined notion of inscrutability rate, used to quantify the asymptotic difficulty of guessing U out of V secret strings, Unexpectedly, the inscrutability rate of any finite-order Markov string-source with hidden statistics remains the same as the unhidden case, i.e., the asymptotic value of hiding the statistics per each symbol is vanishing.