Predicting the Interestingness of Videos

A Computational Approach



Figure 1: Two example clips. The left one is considered more interesting.

Overview

The amount of videos available on the Web is growing explosively. While some videos are very interesting and receive high rating from viewers, many of them are less interesting or even boring. The measure of interestingness of videos can be used to improve user satisfaction in many applications. For example, in Web video search, for the videos with similar relevancy to a query, it would be good to rank the more interesting ones higher. Similarly this measure is also useful in video recommendation, where users will certainly be more satisfied and as a result the stickiness of a video-sharing website will be largely improved if the recommended videos are interesting and attractive.

In this project, we conduct a pilot study on the understanding of human perception of video interestingness, and demonstrate a simple computational method to identify more interesting videos. To this end we first construct two datasets of Flickr and YouTube videos respectively. Human judgements of interestingness are collected and used as the ground-truth for training computational models. We evaluate several off-the-shelf visual and audio features that are potentially useful for predicting interestingness on both datasets. 

Related Publication:

Yu-Gang Jiang, Yanran Wang, Rui Feng, Xiangyang Xue, Yingbin Zheng, Hanfang Yang, Understanding and Predicting Interestingness of Videos, The 27th AAAI Conference on Artificial Intelligence (AAAI), Bellevue, Washington, USA, Jul. 2013.


Datasets

To facilitate the study we need benchmark datasets with ground-truth interestingness labels. Since there is no such kind of dataset publicly available, we collected two new datasets. The first dataset (1,200 videos) was collected from Flickr, which has a criterion called "interestingness" to rank its search results. The second dataset (420 videos) was collected from YouTube, which does not have similar ranking criteria so we hired 10 human annotators to provide intestingness ratings of the videos.

Click here to download the dataset (~7.3GB in total).
Note: People who download this dataset must agree that 1) the use of the data is restricted to research purpose only, and that 2) The authors of the above AAAI'13 paper, and the Fudan University, make no warranties regarding this dataset, such as (not limited to) non-infringement.


Computational Approach


We designed and implemented a computational system to compare the interestingness levels of videos, using a large variety of features such as visual SIFT, audio MFCC, and attribute ObjectBank. Given two videos, the computational system is able to automatically predict which one is more interesting. The prediction framework and some representative results are shown in the following figure. Overall, we observed very promising results on both datasets. For more details, please refer to our AAAI 2013 paper.

     

Figure 2: The prediction framework of our computational system (Left), and a subset of the prediction results (Right). Visual, audio, and attribute-based features are all useful, and the combination of multimodal features can lead to further improvements. See more details in the paper.