Fujitsu and Carnegie Mellon University develop AI-powered social digital twin technology with traffic data from Pittsburgh

March 7, 2024

|By

TOKYO, March 7, 2024 /PRNewswire/ — Fujitsu Limited and Carnegie Mellon University today announced the development of a new technology to visualize traffic situations, including people and vehicles, as part of joint research on Social Digital Twin that began in 2022. The technology transforms a 2D scene image captured by a monocular RGB camera into a digitalized 3D format using AI, which estimates the 3D shape and position of people and objects enabling high-precision visualization of dynamic 3D scenes. Starting February 22, 2024, Fujitsu and Carnegie Mellon University will conduct field trials leveraging data from intersections in Pittsburgh, USA, to verify the applicability of this technology.

This technology relies on AI that has been trained to detect the shape of people and objects through deep learning. This system is composed of two core technologies: 1) 3D Occupancy Estimation Technology that estimates the 3D occupancy of each object only from a monocular RGB camera, and 2) 3D Projection Technology that accurately locates each object within 3D scene models. By utilizing these technologies, images taken in situations in which people and cars are densely situated, such as at intersections, can be dynamically reconstructed in 3D virtual space, thereby providing a crucial tool for advanced traffic analysis and potential accident prevention that could not be captured by surveillance cameras. Faces and license plates are anonymized to help preserve privacy.

Going forward, Fujitsu and Carnegie Mellon University aim to commercialize this technology by FY 2025 by verifying its usefulness not only in transportation but also in smart cities and traffic safety, with the aim of expanding its scope of application.

In February 2022, Fujitsu and Carnegie Mellon University’s School of Computer Science and College of Engineering began their joint research on Social Digital Twin technology, which dynamically replicates complex interplays between people, goods, economies, and societies in 3D. These technologies enable the high-precision 3D reconstruction of objects from multiple photographs taken from videos shot from different angles. However, as the joint research proceeded, it was found that existing video analysis methods were technically insufficient to dynamically reconstruct captured images to 3D. Multiple cameras were required to reproduce this, and there were issues with privacy, workload, and cost, which became a barrier to social implementation.

For full release click here