Engineering

Digital Twins for Training: How We Build Simulation Environments in Omniverse

A technical walkthrough of how datadoo builds Digital Twins in NVIDIA Omniverse and uses them to generate physics-accurate training data at scale.

datadoo research

Apr 8, 20264 min read

Digital Twins for Training: How We Build Simulation Environments in Omniverse

Every synthetic dataset we produce starts in the same place: a Digital Twin. Not a 3D model. Not a game level. A physics-accurate replica of the environment where the trained model will eventually operate. The distinction matters because it determines whether the training data transfers to the real world or falls short.

Scene composition in USD

A Digital Twin in our pipeline is a USD (Universal Scene Description) scene built in NVIDIA Omniverse. USD is the interchange format that allows us to compose scenes from modular assets, define physical properties per object, set up sensor configurations, and version-control the entire environment. It is the same format used by Pixar for film production, adapted for simulation.

The construction process follows a consistent pattern across all domains. First, we define the environment layout: road networks for driving scenes, warehouse floor plans for robotics, operating room geometry for medical. These layouts can come from CAD data, point clouds, or manual specification. The result is a spatial skeleton that defines where everything goes.

Physical properties, not just geometry

Second, we populate the environment with assets. Every asset has physical properties attached: mass, friction coefficient, material BRDF (bidirectional reflectance distribution function), deformation model. A cardboard box in a warehouse scene is not just a textured cube. It has a specific weight, a specific surface roughness, and a specific way it deforms when a robotic gripper applies pressure. These properties are what make the difference between training data that transfers and training data that does not.

Sensor configuration and domain randomization

Third, we configure the sensor suite. For autonomous driving, this means cameras with specific lens models, LiDAR sensors with accurate beam patterns, and radar with correct reflection characteristics. For robotics, it might be depth cameras with realistic noise profiles. For medical imaging, it is X-ray or CT sensor models with accurate tissue attenuation. The sensor configuration must match the deployment hardware exactly.

Fourth, we define the variation space. This is where the power of synthetic data becomes tangible. A single Digital Twin can generate unlimited dataset variants by randomizing parameters within physically valid ranges:

Time of day, weather conditions, and lighting intensity

Object positions, orientations, and counts

Material textures and surface properties

Occlusion patterns and scene clutter

NVIDIA Replicator handles this domain randomization at scale, producing thousands of unique frames per hour from a single base scene.

Bridging the visual gap with Cosmos-Transfer

The fifth step is where NVIDIA Cosmos-Transfer enters the pipeline. Cosmos-Transfer applies style transfer at the rendering level, bridging the visual gap between our physically accurate renders and the target deployment domain. If the client operates cameras with a specific color profile, or their factory has a particular lighting spectrum, Cosmos-Transfer adapts the rendered output to match without compromising the underlying physics.

Auto-labeled, pixel-perfect, zero noise

Every frame that comes out of this pipeline is auto-labeled. Bounding boxes, segmentation masks, depth maps, instance IDs, surface normals. All pixel-perfect, all generated from the scene graph. No human annotation required. No label noise. No inter-annotator disagreement.

The entire process from Digital Twin specification to validated dataset delivery typically takes days, not months. A new environment variant (different weather, different object placement, different sensor configuration) takes hours. This iteration speed is what allows teams to run experiments that would be impractical with real-world data collection.

From specification to dataset

We have built Digital Twins for driving scenarios, warehouse environments, assembly lines, parking lots, medical facilities, and broadcast studios. Each one follows the same methodology: USD scene composition, physical property assignment, sensor configuration, domain randomization, and automated validation. The specifics vary by domain. The process does not.

If you are building a Physical AI system and need training data that transfers to the real world, the conversation starts with your deployment environment. Tell us where your model will operate, what sensors it will use, and what objects it needs to detect. We will build the Digital Twin and generate the data.

Ready to get started?