Waymo leverages Genie 3 to create a world model for self-driving cars
Waymo's World Model builds on DeepMind's Genie 3 but is not a straight port: a specialized post-training process makes the model produce both 2D video and 3D lidar outputs of the same scene. Cameras capture fine visual detail, Waymo says, while lidar adds critical depth information to what a self-driving car "sees" on the road.
The world model lets Waymo take dashcam video from its vehicles and use prompts to alter a vehicle's route, a capability it calls driving action control. These simulations include lidar maps and, Waymo says, deliver greater realism and consistency than older reconstructive simulation methods, allowing engineers to see what would happen if a car took a different turn.
The model can also help improve the self-driving AI without needing to add or remove elements in a scene. Although many dashcam videos exist for training, they typically lack the multimodal sensor data carried by Waymo's vehicles.