It has been a long time since I last blogged. João and I have been busy getting things ready to launch a new project to streamline multi-cloud management that we have been working in the last couple of months. We will talk soon in more detail about it, so stay put.
Meanwhile just wanted to share this very interesting blog post from Google AI blog – video understanding using temporal cycle-consistency learning – where they propose a self-supervised learning method to classify different actions, postures, etc. in videos.
This approach intends to overcome the issue of expensive time consuming manual video per-frames labeling process. It does so by using “[…] correspondences between examples of similar sequential processes to learn representations particularly well-suited for fine-grained temporal understanding of videos“. The process involves training a network to learn a frame encoder, such as a ResNet; then choosing a reference frame from a given video, and then comparing it to the embeddings of another video using Nearest Neighbors (NN). The last step (which provides the cycle part), involves applying NN from video 2 to video 1, to assure it references the same frame.
I strongly encourage you to read the blog post, as it does a great job at explaining the approach.
Last but not least, the code base is available here.