Immersive 3D Video

This project is a recently started collaboration with the CITRIS Tele-Immersion Lab at UC Berkeley. The Tele-immersion lab is developing true immersive 3D video capturing systems by using clusters of video cameras and advanced depth reconstruction algorithms. The current setup consists of 12 clusters of 4 cameras each (one color camera and three black-and-white cameras for depth reconstruction), arranged to capture video inside a 10'x10' area. Each camera cluster is connected to a computer that reads video streams from all four cameras and reconstructs a single colored depth image. These depth images are then transmitted to a central server, from where they can be sent over the Internet in real time.

We joined the project to improve the display component of the system, which receives a set of synchronized depth image streams from one or multiple capturing sites, and recombines and displays the individual streams on a normal desktop computer or in an immersive environment (shown in Figure 1).

Our main focus right now is to encapsulate the 3D video viewer as a plug-in to be used alongside other VR applications (see the collaboration experiment movie on the Movies page). This will allow us to create collaborative applications in which multiple users at remote sites can explore and interact with the same 3D data, and also see and interact with each other in real time. We believe that this integration will provide a breakthrough in collaborative data exploration, and move video conferencing and shared whiteboards into the 21st century.
Figure 1: A (pre-recorded) 3D video stream displayed in the KeckCAVES immersive environment. The immersive display allows an observer to interact with (streaming real-time) video data as if the recorded persons were in the same room.

Project Goals

The goal of our side of the project is to create a display engine for 3D video streams that can produce the best possible 3D reconstruction of a captured scene or captured persons, given the technical limitations of the capturing process itself. The native format of the video streams are individual depth images (one per camera cluster) containing arrays of pixels with color and depth information. The naive reconstruction approach reprojects each of these depth pixels individually into world space and renders them as (fat) points or splats. This approach results in fairly poor reconstruction, as the human visual system has trouble perceiving the captured shapes as they dissolve into unconnected point clouds.

Project Status

We initially implemented the simple point cloud rendering algorithm described above, with the expected results. Our current approach triangulates each individual depth image on the fly and in real time, and renders the resulting triangles. We will refine this method in the future, and use the programmability of modern graphics processing units (GPUs) to improve the speed of reconstruction and reduce the latency in a two-way real-time videoconference.

We now have a new GPU-based renderer for 3D video streams, programmed by Tony Bernardin. We recently performed a collaborative visualization experiment with UC Berkeley, where we used a new Vrui collaboration infrastructure with an embedded 3D video renderer and a shared version of 3D Visualizer to provide a shared immersive workspace to spatially distributed users.

Pages in This Section

Pictures
Screen shots and photographs showing the current implementation of the 3D video display engine.
Movies
Movies showing the 3D video rendering engine in immersive environments.