Eye movements are a reliable indicator of overt visual attention. Because multiple factors influence where people attend in images, understanding how our attention is deployed and predicting where we look at are challenging, and a subject of ongoing research across the world. This tutorial reviews three main points:
As we expect to soon find gaze recording devices on hand-held devices, researchers have explored the use of eyetracking as a form of implicit user input to computational algorithms. We will review existing literature, with a focus on attention-driven image, and video editing.
In parallel to these gaze contingent applications, we will present the strengths and weaknesses of computational models of visual attention, with a special emphasis on saccadic models.
In the third part of the tutorial, we will discuss how saliency models could be used to add efficiencies to graphics applications, with a focus on compression, and rendering.
Full day tutorial (3×90 minutes of presentation + the last time slot dedicated to freely discuss and to share thoughts and reflections)
The study of visual attention has played an important role in graphics to inform algorithms about the perceptual priorities of viewers. Because multiple factors influence where people attend in images, and videos, recording, understanding, and predicting visually salient regions is challenging, and a subject of ongoing research across the world.
This proposal consists of three parts described hereafter:
As we expect to soon find gaze recording devices on head-mounted displays, phones, and screens, researchers have explored the use of eyetracking as a form of implicit user input to computational algorithms because gaze is a natural indicator of what the viewer, or, user is interested in. As a result, several gaze contingent applications have been proposed, such as, eye-as-cursor, and gaze contingent displays. We term these applications `online’ applications because they involve streaming gaze data [Duchowski 2002]. Research in psychophysics and vision science has shown that there is a large amount of consistency in where people look at in images and videos. Thus, a corpus of eyetracking data collected offline has been used to inform algorithms for image, and video processing [Walber 2014, Jain 2012, Jain 2015]. We will review existing literature, with a focus on attention-driven image, and video editing. (Eakta Jain)
In parallel to these methods, models of visual attention have been used to locate regions of interest in a scene, and have been used to create efficient algorithms for rendering [Yee and Pattanaik 2004], and compression [Mantiuk 2003]. (Sumanta Pattanaik)
Most of these computation models of visual attention have been motivated by the seminal work of Koch and Ullman [Koch and Ullman, 1985] who proposed a plausible computational architecture to predict human gaze. A key feature of these models is the computation of a 2D static saliency map that represents the salience of a visual scene. Although the saliency map representation is a convenient way to indicate where we look within a scene, this representation does not completely account for the complexities of our visual system. In this second talk, we will present the strengths and weaknesses of computational models of visual attention. One obvious limitation concerns the fact that these models do not make any assumption about eye movements and oculomotor behavioural biases. We will also discuss the methods used to evaluate the performances, i.e. how to evaluate the degree of similarity between the prediction and a ground truth. Finally, we will introduce a new generation of computational models which are termed saccadic models. Saccadic models strive both to overcome the limitations of current computational models of visual attention and to provide a more comprehensive framework for modeling our visual attention. The main idea consists in predicting the visual scanpaths, i.e. the suite of fixations and saccades an observer would perform to sample the visual environment. As saliency models, saccadic models have to predict, on one hand, the salient areas of our visual environment and, on the other hand, to generate plausible visual scanpaths, i.e. having the same peculiarities as human scanpaths [Le Meur and Liu, 2015]. (Olivier Le Meur)
[Duchowski 2002] Duchowski, A. T. (2002). A breadth-first survey of eye-tracking applications. Behavior Research Methods, Instruments, & Computers, 34(4), 455-470.
[Walber 2014] Walber, T. C., Scherp, A., & Staab, S. (2014). Smart photo selection: Interpret gaze as personal interest. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (pp. 2065-2074).
[Jain 2012] Jain, E., Sheikh, Y., & Hodgins, J. (2012). Inferring artistic intention in comic art through viewer gaze. In Proceedings of the ACM Symposium on Applied Perception (pp. 55-62). ACM.
[Jain 2015] Jain, E., Sheikh, Y., Shamir, A., & Hodgins, J. (2015). Gaze-Driven Video Re-Editing. ACM Transactions on Graphics (TOG), 34(2), 21.
[Yee and Pattanaik 2004] Yee, H., & Pattanaik, S. (2004). Attention for computer graphics rendering.
[Mantiuk 2003] Mantiuk, R., Myszkowski, K., & Pattanaik, S. (2003). Attention guided MPEG compression for computer animations. In Proceedings of the 19th spring conference on Computer graphics (pp. 239-244). ACM.
[Koch and Ullman, 1985] Koch, C., & Ullman, S. (1987). Shifts in selective visual attention: towards the underlying neural circuitry. In Matters of intelligence (pp. 115-141). Springer Netherlands.
[Le Meur and Liu, 2015] Le Meur, O., & Liu, Z. (2015). Saccadic model of eye movements for free-viewing condition. Vision research.
The tutorial will be of interest to students and researchers working on computer sciences, cognitive sciences and computer graphics. The prerequisites are kept to a minimum and anyone having elementary background in computer sciences, image processing and computer graphics can follow this tutorial.
Olivier Le Meurobtained his PhD degree from the University of Nantes in 2005. From 1999 to 2009, he has worked in the media and broadcasting industry. In 2003 he joined the research center of Thomson-Technicolor at Rennes where he supervised a research project concerning the modelling of the human visual attention. Since 2009 he has been an associate professor for image processing at the University of Rennes 1. In the IRISA/SIROCCO team his research interests are dealing with the understanding of the human visual attention. It includes computational modelling of the visual attention and saliency-based applications (video compression, objective assessment of video quality, retargeting).
Eakta Jainis an Assistant Professor of Computer and Information Science and Engineering at the University of Florida. She received her PhD and MS degrees in Robotics from Carnegie Mellon University, and her B.Tech. degree in Electrical Engineering from IIT Kanpur. She has worked in industrial research at Texas Instruments R&D labs, Disney Research Pittsburgh, and the Walt Disney Animation Studios. Her research interests are in building human-centered computer graphics algorithms to create and manipulate artistic content including traditional hand animation, comic art, and films. Her work has been presented at venues such as ACM SIGGRAPH, and has won multiple awards.
Sumanta N. Pattanaikis an Associate Professor of Computer Science in the School of Electrical Engineering and Computer Science at University of Central Florida , Orlando. Previously, he was a Research Associate at the Cornell Program of Computer Graphics, Cornell University (1995-2001), a Post-Doctoral Associate at the SIAMES group of IRISA/INRIA, Rennes, France (1993-1995), and a Senior Staff Scientist at the Graphics Department of NCST, Bombay (1985-1995). He received his PhD from the Department of Computer Science and Information Science of BITS, Pilani in November 1993. His area of interest includes Computer Graphics, Virtual Reality and Visualization. He is currently focusing on Real-time Realistic Image Synthesis.