On the Variance of the Fisher Information for Deep Learning
On the Variance of the Fisher Information for Deep Learning
Soen, A & Sun, K 2021, 'On the Variance of the Fisher Information for Deep Learning', Forthcoming in NeurIPS 2021.
The Fisher information matrix (FIM) is one of the most fundamental objects in statistical machine learning. Intuitively, given a single random observation and a parametric model to fit, (Fisher) information is a measurement of how informative the observation is to the unknown parameters of the model. If the observation contains zero (Fisher) information, then parameter estimation is impossible. If there is high information, then parameter estimation can be done more efficiently when compared to those with low (Fisher) information. In the realm of deep learning, it is closely related to the loss landscape, the variance of the parameters, second order optimization, and deep learning theory. However, the exact FIM is either unavailable in closed form or too expensive to compute. In practice, it is almost always estimated based on empirical samples. We investigate two such estimators based on two equivalent representations of the FIM -- both unbiased and consistent with respect to the underlying "true" FIM. Their estimation quality is characterized by their variance given in closed form. We bound their variances and analyze how the parametric structure of a deep neural network can impact the variance. We discuss the meaning of this variance measure and our bounds in the context of deep learning.
See the paper here.