3D streaming gets leaner by seeing only what matters
A new approach to streaming technology may significantly improve how users experience virtual reality and augmented reality environments, according to a study from NYU Tandon School of Engineering.
The research — presented in a paper at the 16th ACM Multimedia Systems Conference on April 1, 2025 — describes a method for directly predicting visible content in immersive 3D environments, potentially reducing bandwidth requirements by up to 7-fold while maintaining visual quality.
The technology is being applied in an ongoing NYU Tandon National Science Foundation-funded project to bring point cloud video to dance education, making 3D dance instruction streamable on standard devices with lower bandwidth requirements.
"The fundamental challenge with streaming immersive content has always been the massive amount of data required," explained Yong Liu — professor in the Electrical and Computer Engineering Department (ECE) at NYU Tandon and faculty member at both NYU Tandon's Center for Advanced Technology in Telecommunications (CATT) and NYU WIRELESS — who led the research team. "Traditional video streaming sends everything within a frame. This new approach is more like having your eyes follow you around a room — it only processes what you're actually looking at."
The technology addresses the "Field-of-View (FoV)" challenge for immersive experiences. Current AR/VR applications demand high bandwidth — a point cloud video (which renders 3D scenes as collections of data points in space) consisting of 1 million points per frame requires more than 120 megabits per second, nearly 10 times the bandwidth of standard high-definition video.
Unlike traditional approaches that first predict where a user will look and then calculate what's visible, this new method directly predicts content visibility in the 3D scene. By avoiding this two-step process, the approach reduces error accumulation and improves prediction accuracy.
The system divides 3D space into "cells" and treats each cell as a node in a graph network. It uses transformer-based graph neural networks to capture spatial relationships between neighboring cells, and recurrent neural networks to analyze how visibility patterns evolve over time.
For pre-recorded virtual reality experiences, the system can predict what will be visible for a user 2-5 seconds ahead, a significant improvement over previous systems that could only accurately predict a user’s FoV a fraction of a second ahead.
"What makes this work particularly interesting is the time horizon," said Liu. "Previous systems could only accurately predict what a user would see a fraction of a second ahead. This team has extended that."
The research team's approach reduces prediction errors by up to 50% compared to existing methods for long-term predictions, while maintaining real-time performance of more than 30 frames per second even for point cloud videos with over 1 million points.
For consumers, this could mean more responsive AR/VR experiences with reduced data usage, while developers can create more complex environments without requiring ultra-fast internet connections.
"We're seeing a transition where AR/VR is moving from specialized applications to consumer entertainment and everyday productivity tools," Liu said. "Bandwidth has been a constraint. This research helps address that limitation."
The researchers released their code to support continued development. Their work was supported in part by the US National Science Foundation (NSF) grant 2312839.
In addition to Liu, the paper's authors are Chen Li and Tongyu Zong both NYU Tandon Ph.D candidates in Electrical Engineering; Yueyu Hu, NYU Tandon Ph.D. candidate in Electrical and Electronics Engineering; and Yao Wang, NYU Tandon professor who sits in ECE, the Biomedical Engineering Department, CATT and NYU WIRELESS.
C. Li, T. Zong, Y. Hu, Y. Wang, Y. Liu. 2025. Spatial Visibility and Temporal Dynamics: Rethinking Field of View Prediction in Adaptive Point Cloud Video Streaming. In Proceedings of the 16th ACM Multimedia Systems Conference (MMSys '25). Association for Computing Machinery, New York, NY, USA, 24–34.