From Object Interactions to Fine-grained Video Understanding

(Joint work by Chih-Yao Ma, Asim Kadav, Iain Melvin, Zsolt Kira, Ghassan AlRegib, Hans Peter Graf)

Video understanding tasks such as action recognition and caption generation are crucial for various real-world applications in surveillance, video retrieval, human behavior understanding, etc. In this work, we present a generic recurrent module to detect relationships and interactions between arbitrary object groups for fine-grained video understanding. Our work is applicable to various open domain video understanding problems. In this work, we validate our method on two video understanding tasks with new challenging datasets: fine-grained action recognition on Kinetics and visually grounded video captioning on ActivityNet Captions.


Invited SPE Webinar

Successful Leveraging of Human Visual System Modeling and Machine Learning in Computational Seismic Interpretation

In today’s growing complexity of seismic data, in both size and resolution, manual interpretation increasingly relies on computational seismic interpretation (CSI) for more efficient, accurate, and effective interpretation. This webinar will highlight our studies that have focused on leveraging perception and machine learning in creating a set of CSI algorithms and software tools.