Contrastive predictive coding for video representation learning
Abstract
Contrastive Predictive Coding (CPC) (van den Oord et al., 2018) has been successfully used to learn representations for different signals (audio, text, images). It uses an autoregressive modeling and contrastive estimation to learn long-term temporal relation inside the raw signal while remaining robust to local noise. The result is a higher level signal representation useful to solve downstream tasks. Using CPC to learn representations for videos remains challenging due to the structure and the high dimensionality of the signal. In this work, we propose different implementations of CPC for video signal. The learned representation increases the performance of an action recognition classifier.
Domains
Artificial Intelligence [cs.AI]
Origin : Files produced by the author(s)