Professor Kristen Grauman

Title: First-Person Video for Understanding Interactions

Abstract: Today’s perception systems excel at naming things in third-person Internet photos or videos, which purposefully convey a visual scene or moment.  In contrast, first-person or “egocentric” perception requires understanding the multi-modal video that streams to a person’s (or robot’s) wearable camera.  While video from an always-on wearable camera lacks the curation of an intentional photographer, it does provide a special window into the camera wearer’s attention, goals, and interactions with people and objects in her environment.  These factors make first-person video an exciting avenue for the future of perception in augmented reality and robot learning.

Motivated by this setting, I will present our recent work on first-person video.  First, we explore learning visual affordances to anticipate how objects and spaces can be used.  We show how to transform egocentric video into a human-centric topological map of a physical space (such as a kitchen) that captures its primary zones of interaction and the activities they support.  Moving down to the object level, we develop video anticipation models that localize interaction “hotspots” indicating how/where an object can be manipulated (e.g., pressable, toggleable, etc.).  Towards translating these affordances into robot action, we prime reinforcement learning agents to prefer human-like interactions, thereby accelerating their task learning.  Finally, I will preview a new multi-institution large-scale egocentric video dataset effort.


Kristen Grauman is a Professor in the Department of Computer Science at the University of Texas at Austin and a Research Director in Facebook AI Research (FAIR).  Her research in computer vision and machine learning focuses on visual recognition, video, and embodied perception.  Before joining UT-Austin in 2007, she received her Ph.D. at MIT.  She is an IEEE Fellow, AAAI Fellow, Sloan Fellow, and recipient of the 2013 Computers and Thought Award.  She and her collaborators have been recognized with several Best Paper awards in computer vision, including a 2011 Marr Prize and a 2017 Helmholtz Prize (test of time award).  She served for six years as an Associate Editor-in-Chief for PAMI and served as a Program Chair of CVPR 2015 and NeurIPS 2018. http://www.cs.utexas.edu/~grauman/

Professor Yousef Saad

Title: Computing, updating, and tracking invariant subspaces

Abstract: Computing invariant subspaces is at the  core of many applications, from machine learning to signal processing, and control  theory,  to  name  just  a  few examples. Often one  wishes to compute the subspace  associated with eigenvalues located at  one end of  the spectrum, i.e., either  the largest or  the smallest eigenvalues. In  addition, it is  quite common that  the data at  hand undergoes frequent changes  and one is  required to keep  updating or tracking  the target invariant  subspace. This  talk will  discuss the  problem from  a computational linear algebra viewpoint. It will present standard tools for computing invariant subspaces,  and  describe how  these  are adapted  for situations  where  rapid updating is needed.   We will also show the many  connections that exist between different viewpoints adopted by practitioners.  One of the best known techniques for computing  invariant subspaces  is the  subspace iteration  algorithm. While this algorithm tends  to be slower than a Krylov subspace approach  such as the Lanczos algorithm, it has  many attributes that make it the  method of choice in many applications.  One  of these attributes is its tolerance  of changes in the matrix. The talk will end with a few illustrative examples.


Yousef Saad is a College of Science and Engineering (CSE) distinguished professor with the department of computer science and engineering at the University of Minnesota. He received the “Doctorat d’Etat” from the university of Grenoble (France) in 1983. He joined the university of Minnesota in 1990 as a Professor of computer science and a Fellow of the Minnesota Supercomputer Institute. He was head of the department of Computer Science and Engineering from January 1997 to June 2000, and became a CSE distinguished professor in 2005. From 1981 to 1990, he held positions at the University of California at Berkeley, Yale, the University of Illinois, and the Research Institute for Advanced Computer Science (RIACS). His current research interests include: numerical linear algebra, sparse matrix computations, iterative methods, parallel computing, numerical methods for electronic structure, and linear algebra methods in machine learning. He is the author of two monographs and over 200 journal articles. He is also the developer or co-developer of several software packages for solving sparse linear systems of equations and eigenvalue problems including SPARSKIT, pARMS, ITSOL, and EVSL. Yousef Saad is a SIAM fellow (class of 2010) and a fellow of the AAAS (2011).

Professor Ravi Ramamoorthi

Title: Capturing Realistic Virtual Experiences with Light Fields

Abstract: Many applications in e-commerce, video conferencing, virtual avatars, or immersive photography and virtual/augmented reality seek to capture virtual experiences of objects or scenes from a few photographs. This can be understood within the context of light fields, the entire 4D spatial and directional field of light flowing across a scene. Capturing the light field from sparse images enables one to synthesize new views, and interact immersively with a scene.

In this talk, I will first discuss depth estimation from light field cameras, as needed for immersive 3D photography.  Light fields enable combination of cues historically treated separately in computer vision, such as correspondence, defocus and shading.  I then discuss a variety of recent approaches to view synthesis and virtual experiences my group has developed, including creating 4D light fields from only 4 corner views or even a single image taken on a standard cellphone, theoretical Nyquist-rate reductions to enable sampling from a sparse set of casually captured views, novel volumetric neural radiance field representations, and light field video.


Ravi Ramamoorthi is the Ronald L. Graham professor of Computer Science at the University of California, San Diego, and founding Director of the UC San Diego Center for Visual Computing.  He received his Ph.D. at Stanford in 2002, and earlier held tenured faculty positions at Columbia University and UC Berkeley. Prof. Ramamoorthi is an author of more than 150 refereed publications in computer graphics and computer vision, including 80+ at ACM SIGGRAPH/TOG, and has played a key role in building multi-faculty research groups that have been recognized as leaders in computer graphics and computer vision at Columbia, Berkeley and UCSD. His research has been recognized with a half-dozen early career awards, including the ACM SIGGRAPH Significant New Researcher Award in computer graphics in 2007, and the Presidential Early Career Award for Scientists and Engineers (PECASE) for his work in physics-based computer vision in 2008.   He was elevated to IEEE and ACM Fellow in 2017, and inducted into the SIGGRAPH Academy in 2019.

Prof. Ramamoorthi’s work has had substantial impact in industry, with techniques like spherical harmonic lighting being adopted in industry-standard RenderMan software, and widely used in interactive applications and movie productions. He has graduated more than 20 postdoctoral, Ph.D. and M.S. students, many of whom have taken positions at leading universities or research labs and won leading fellowships and awards, including the ACM SIGGRAPH Doctoral Dissertation Award. He has also taught the first open online course in computer graphics as one of the first nine classes on the edX platform, with more than 100,000 registrations to date and a Chinese translation available via XuetangX. He (and his course) received an inaugural edX Prize certificate for this effort in 2016 and again in 2017, as the only computer science recipient and only course to be recognized twice.

Professor Matthew Turk

Title:  Beyond Bias and Fairness in Face Recognition

Abstract:  Face recognition technologies have made great progress and are being deployed in a wide variety of applications, yet they also raise serious issues with respect to privacy, bias, fairness, and serious misuse by companies, governments, and individuals. Some civil liberties and advocacy groups have been increasingly raising warnings and promoting legislation to ban such technologies. Many in law enforcement push back, arguing that it saves lives and helps to make sociaty safer. Legislative bodies are trying to decide if and how to address these issues, sometimes with limited information. As technologists, what is our role in this public debate? What should we do about it? Let’s discuss.


Matthew Turk is the President of the Toyota Technological Institute at Chicago (TTIC), a graduate academic institution that focuses on research in computer science theory, artificial intelligence, and machine learning. He was formerly a professor and chair of the Department of Computer Science at the University of California, Santa Barbara, where he co-directed the UCSB Four Eyes Lab, focused on the “four I’s” of Imaging, Interaction, and Innovative Interfaces. He has also worked at Martin Marietta Aerospace, LIFIA/ENSIMAG (Grenoble, France), Teleos Research, and Microsoft Research, where he was a founder of the Vision Technology Group. He received a BS from Virginia Tech, an MS from Carnegie Mellon University, and a PhD from the Massachusetts Institute of Technology. He co-founded an augmented reality startup company in 2014 that was acquired by PTC Vuforia in 2016. Dr. Turk has received several best paper awards, and he is an ACM Fellow, an IEEE Fellow, an IAPR Fellow, and the recipient of the 2011-2012 Fulbright-Nokia Distinguished Chair in Information and Communications Technologies.

Professor Shang-Hong Lai

Title: Deep Multimodal Learning for Computer Vision Applications

Abstract: Human senses the world through multimodal inputs, such as vision, audio, text, haptics, etc. Each
modality provides description of the object or the scene based on its unique characteristic
representation. Deep learning technology has proved to be very successful for many different tasks with
single-modal inputs by training models with extremely large and representative datasets. Deep
multimodal learning has attracted increasing attention since it can benefit model learning from data of
different modalities to boost the accuracy and robustness of the deep neural network system. In this
talk, I will present some real examples for computer vision applications that employ multimodal learning
in the system or product development process. For the first part, I will introduce the multimodal
learning approaches for training face recognition and face anti-spoofing systems, including those in
Windows Hello. In the second part of my talk, I will present the text-image co-training approach that has
been used for developing accurate image captioning and intelligent document understanding products
in Microsoft.


Shang-Hong Lai received the Ph.D. degree from University of Florida, Gainesville, USA in 1995. He worked at Siemens Corporate Research in Princeton, New Jersey, USA, as a member of technical staff during 1995-1999. Since 1999, he joined the Department of Computer Science, National Tsing Hua University, Taiwan, where he is now a professor there. Since the summer of 2018, Dr. Lai has been on leave from NTHU to join Microsoft AI R&D Center, Taiwan. He is currently a principal researcher at Microsoft AI R&D Center and leads a science team focusing on computer vision research for face related applications. Dr. Lai’s research interests are mainly focused on computer vision, image processing, and machine learning. He has authored more than 300 papers published in refereed international journals and conferences in these areas. In addition, he has been awarded around 30 patents on his research on computer vision. He has involved in the organization for a number of international conferences in computer vision and related areas, including ICCV, CVPR, ECCV, ACCV, ICIP, etc. Furthermore, he has served as an associate editor for Pattern Recognition and Journal of Signal Processing Systems.

Panel Lead Dr. Subhro Das


Subhro Das is a Research Staff Member in AI Algorithms at the MIT-IBM Watson AI Lab, IBM Research, Cambridge MA. He is a Research Affiliate at MIT, co-leading IBM’s engagement in the Bridge pillar of MIT Quest for Intelligence. He serve as the Co-Chair of the AI Learning Professional Interest Community (PIC) at IBM Research. His research interests are in distributed learning over multi-agent networks, dynamical systems, multi-agent reinforcement learning, accelerated & adaptive optimization methods, and online learning in non-stationary environments – broadly in the areas of AI, machine learning, and statistical signal processing with applications in healthcare and social good. Before moving to Cambridge, he was a Research Scientist at IBM T.J. Watson Research Center, New York. Therein, he worked on developing signal processing and machine learning based predictive algorithms for a broad variety of biomedical and healthcare applications. He received MS and PhD degrees in Electrical and Computer Engineering from Carnegie Mellon University in 2014 and 2016, respectively. His dissertation research was in distributed filtering and prediction of time-varying random fields and he was advised by Prof. José M. F. Moura. He completed his Bachelors (B.Tech.) degree in Electronics & Electrical Communication Engineering from Indian Institute of Technology Kharagpur in 2011. During the summers of 2009, 2010 and 2015, he was an intern at Ulm University (Germany), Gwangju Institute of Science & Technology (South Korea), and, Bosch Research (Palo Alto, CA), respectively.

Keynote, Plenary, & Tutorial Speaker Recordings: All keynote, plenary and tutorials are required by the Signal Processing Society to be recorded and included in the SPS Resource Center. Because of this, IEEE Copyright and Consent Forms must be completed by each of these speakers before they can be confirmed by the conference. Organizers are responsible for collecting the recording file and forms, and any recording costs should be planned and budgeted for. See the SPS Conference Organizer Guidelines for more detail.