What Makes for A Good Thumbnail? Video Content Summarization into A Single Image
Wednesday, Aug 6: 3:25 PM - 3:45 PM
Topic-Contributed Paper Session
Music City Center
Thumbnails, reduced-size preview images or clips, serve as pivotal visual cues that help consumers navigate through video selection while "previewing" for what to expect in the video. This paper provides a scalable framework combining state-of-art computer vision techniques, novel video platform design and novel use of Bayesian learning model in a high-dimensional context to study: (i) how thumbnails, relative to video content, affect viewers' behavior, and (ii) how to optimize video thumbnail selection under different creator objectives. To achieve this, we first propose a video mining procedure that automatically decomposes high-dimensional video data into interpretable features using computer vision, deep learning, and LLMs. Motivated by behavioral theories such as expectation-disconfirmation theory and Loewenstein's theory of curiosity, we then construct theory-based measures to assess the role through which thumbnails affect video reactions. Using both secondary data from YouTube and a novel video platform called "CTube" that we build to exogenously randomize thumbnails across videos, we find that content disconfirmation between the thumbnail and the video leads to opposing effects. It leads to more views, higher watchtime but lower post-video engagement (e.g., likes and comments). To further investigate the underlying behavioral process, we build a Bayesian learning model in a high-dimensional context in which consumers' decisions to click on a video and continue watching the video are based on their priors (the thumbnail) and updated beliefs of the video content (the video's frames, characterized as multi-dimensional and correlated video topic proportions). We show that viewers overall prefer watching videos longer when there is a higher disconfirmation between their initial and updated content beliefs, suggesting one role of thumbnails as generating curiosity for what may come next in the video. In addition, viewers prefer less disconfirmation before observing the thumbnail, highlighting the role of disconfirmation may change before and after the thumbnail. Using the model estimates, we then run counterfactual analyses to propose optimal thumbnails and compare them with current practices of thumbnail recommendation to guide creators and platforms in thumbnail selection. Our framework provides a scalable and cost-effective way to optimize thumbnail selection and design for companies, with broad applicability to content summarization across various formats (e.g., book covers, movie posters).
Video content, expectation-disconfirmation, theory of curiosity, computer vision, experiments, learning models
You have unsaved changes.