08:00-08:15 am | Rakesh Ranjan | Opening Remarks |
08:15-09:00 am | Douglas Lanman |
[Keynote] Taking a Small Step in a Different Direction
The computer vision community has recently made rapid and significant progress on the grand challenge of novel view synthesis. New frameworks â including multiplane images, neural radiance fields, and Gaussian splatting â may ultimately provide the foundation for tomorrowâs volumetric video systems. When viewed with emerging mixed reality (MR) headsets, such frameworks may unlock fully immersive forms of todayâs television and film content. Yet, these emerging view synthesis frameworks do not fully meet the needs of MR headsets. In addition to capturing and viewing entire environments across broad viewpoint changes, MR fundamentally needs computer vision systems that can also reproject from headset-mounted sensors to the perspective of the viewerâs eyes. In this talk, we aim to inspire a greater focus in the computer vision community on developing view synthesis algorithms that can achieve this âsmall stepâ in perspective with algorithms that may fundamentally differ from emerging frameworks (due to the need to achieve this transformation in real time, with limited computing resources, and at a fidelity approaching that of human vision). We start with a systems-level view of this problem: examining whether hardware modifications alone might eliminate the need for real-time view reprojection for MR, based on recent psychophysical studies determining the threshold of detectability for perspective distortions. Weâll also review our latest progress on meeting this system-level challenge, reviewing our âneural passthroughâ and âreverse passthroughâ headset prototypes, as well as early demonstrations of mixed reality stylization and editing systems that can be applied in combination with real-time passthrough reprojection algorithms. We conclude by looking towards the larger problems in this space, including building volumetric capture and real-time view synthesis methods that match the limits of human perception, including the challenges of variable-focus, wide-field-of-view, and high-dynamic-range imaging. |
09:00-09:30 am | Nima Kalantari |
Reconstructing 3D Scenes from Sparse Images
Reconstructing the visual appearance of scenes has a wide range of applications, including virtual/augmented reality, e-commerce, and video conferencing. In recent years, the field of novel view synthesis has seen significant progress with the introduction of approaches like neural radiance fields. However, accurately reconstructing 3D scenes still requires a large number of input images, which is not feasible in most practical scenarios. In this talk, I will discuss our recent efforts to reconstruct 3D scenes from only a few or even a single image. Specifically, I will first discuss our work on novel view synthesis from a few images using 3D Gaussian splatting. Then, I will talk about our approach to handle view-dependent highlights in single-image view synthesis. |
09:30-10:00 am | Federico Tombari |
3D scene understanding with neural representations for
Augmented Reality
Neural representations have shown tremendous progress and represent a promising tool for novel applications in the space of Augmented and Mixed Reality. In this talk I will give an overview on the use of neural representations for AR/XR applications with a focus on 3D scene understanding, and for common tasks such as novel view synthesis, 3D semantic segmentation and 3D asset generation. For each of these three tasks, I will first highlight some important practical limitations of current neural representations. I will then show solutions designed to overcome such limitations, which include mobile novel view synthesis at high framerate, open set 3D scene segmentation with radiance fields, and realistic 3D asset generation from text prompts. |
10:00-10:30 am | Lei Xiao |
Exploring Neural Rendering for Mixed Reality
In the realm of Mixed Reality, the pursuit for perceptually-realistic 3D reconstruction and rendering of dynamic environments represents a significant research challenge. This is a crucial step towards our ultimate aspiration of passing the Visual Turing Test on headsets. In this talk, we will share our experiences and learnings on this subject. We will touch upon a variety of specific challenges we have encountered, such as gaze-contingent rendering, real-time supersampling, real-time passthrough view synthesis, online video depth estimation, and dynamic object reconstruction. Additionally, we will share our explorations in the creative domain of 3D stylization, and our initial steps towards text-driven realistic 3D editing. |
10:30-11:15 am | Poster Spotlight + Break | Location: Convention Center Arch, Exhibition Hall 4E (Posters 70-79) |
11:15-11:45 pm | Natalia Neverova |
Generative AI for 3D content creation
Scaling XR Metaverse applications will require development of fast and performant models for immersive content creation, capable of generating and editing individual 3D assets, animated 3D characters and eventually whole 3D worlds. In this presentation, we will talk about first foundation blocks that we are building as a part of this journey, from generating shapes and texturing to creating full 3D assets with PBR materials, starting with textual descriptions and visuals. |
11:45-12:15 pm | Noam Aigerman |
Manipulating, Deforming and Controlling 3D Objects with
Machine Learning
Production of 3D content relies on the ability to manipulate 3D objects by âdeformingâ them, i.e., moving around 3D points on the object: each frame in an animation sequence is a deformation of a base model; alternatively, generation of 3D shapes often relies on âsculptingâ the object from other shapes through deformation, or otherwise adding additional details to an existing object. Thus, enabling neural networks to directly deform 3D objects can automate and improve such applications, making learning of deformations a heavily-researched area. However, devising learning-based methods to accurately and robustly produce deformations that meet practical application needs is a challenging and unsolved task, especially when considering less-explicit 3D representations, such as NeRFs, SDFs and Gaussian Splats. This talk aims to give an overview of the specific challenges that need to be overcome for a practical framework for learning deformations, as well as the recent directions my work has taken to tackle them. |
12:15-12:45 pm | Laura Leal-Taixe |
Efficient Annotations for the Trackers of Tomorrow
Multi-object tracking is an essential task for mixed reality, which aims at seamlessly merging the virtual and the real world, and therefore needs to have a good understanding of the dynamics of the real world. Tracking algorithms are thriving on large-scale dataset training, but video annotation is very time consuming. There are surprisingly very few works exploring how to efficiently label tracking datasets comprehensively. In this work, we introduce SPAM, a tracking data engine that provides high-quality labels with minimal human intervention. SPAM is built around two key insights: i) most tracking scenarios can be easily resolved. To take advantage of this, we utilize a pre-trained model to generate high-quality pseudo-labels, reserving human involvement for a smaller subset of more difficult instances; ii) handling the spatiotemporal dependencies of track annotations across time can be elegantly and efficiently formulated through graphs. Therefore, we use a unified graph formulation to address the annotation of both detections and identity association for tracks across time. Based on these insights, SPAM produces high-quality annotations with a fraction of ground truth labeling cost. |
Director of Research, Meta Reality Labs
Meta, GenAI
University of Montreal
Meta Reality Labs Research
Nvidia Research & TUM
Texas A&M University
Important Dates:
Topics of Interest include:
Submission Guidelines:
Meta
Meta
Meta
Meta