datadoo
Back to blog
Engineering

Digital Twins for Training: How We Build Simulation Environments in Omniverse

A technical walkthrough of how datadoo builds Digital Twins in NVIDIA Omniverse and uses them to generate physics-accurate training data at scale.

DD
datadoo research
Apr 8, 20264 min read
environmentDigital TwinNVIDIA Omniversetraining dataauto-labeledphysics-safe
Engineering

Digital Twins for Training: How We Build Simulation Environments in Omniverse

Every synthetic dataset we produce starts in the same place: a Digital Twin. Not a 3D model. Not a game level. A physics-accurate replica of the environment where the trained model will eventually operate. The distinction matters because it determines whether the training data transfers to the real world or falls short.

A Digital Twin in our pipeline is a USD (Universal Scene Description) scene built in NVIDIA Omniverse. USD is the interchange format that allows us to compose scenes from modular assets, define physical properties per object, set up sensor configurations, and version-control the entire environment. It is the same format used by Pixar for film production, adapted for simulation.

The construction process follows a consistent pattern across all domains. First, we define the environment layout: road networks for driving scenes, warehouse floor plans for robotics, operating room geometry for medical. These layouts can come from CAD data, point clouds, or manual specification. The result is a spatial skeleton that defines where everything goes.

Second, we populate the environment with assets. Every asset has physical properties attached: mass, friction coefficient, material BRDF (bidirectional reflectance distribution function), deformation model. A cardboard box in a warehouse scene is not just a textured cube. It has a specific weight, a specific surface roughness, and a specific way it deforms when a robotic gripper applies pressure. These properties are what make the difference between training data that transfers and training data that does not.

Third, we configure the sensor suite. For autonomous driving, this means cameras with specific lens models, LiDAR sensors with accurate beam patterns, and radar with correct reflection characteristics. For robotics, it might be depth cameras with realistic noise profiles. For medical imaging, it is X-ray or CT sensor models with accurate tissue attenuation. The sensor configuration must match the deployment hardware exactly.

Fourth, we define the variation space. This is where the power of synthetic data becomes tangible. A single Digital Twin can generate unlimited dataset variants by randomizing parameters within physically valid ranges: time of day, weather conditions, object positions, material textures, lighting intensity. NVIDIA Replicator handles this domain randomization at scale, producing thousands of unique frames per hour from a single base scene.

The fifth step is where NVIDIA Cosmos-Transfer enters the pipeline. Cosmos-Transfer applies style transfer at the rendering level, bridging the visual gap between our physically accurate renders and the target deployment domain. If the client operates cameras with a specific color profile, or their factory has a particular lighting spectrum, Cosmos-Transfer adapts the rendered output to match without compromising the underlying physics.

Every frame that comes out of this pipeline is auto-labeled. Bounding boxes, segmentation masks, depth maps, instance IDs, surface normals. All pixel-perfect, all generated from the scene graph. No human annotation required. No label noise. No inter-annotator disagreement.

The entire process from Digital Twin specification to validated dataset delivery typically takes days, not months. A new environment variant (different weather, different object placement, different sensor configuration) takes hours. This iteration speed is what allows teams to run experiments that would be impractical with real-world data collection.

We have built Digital Twins for driving scenarios, warehouse environments, assembly lines, parking lots, medical facilities, and broadcast studios. Each one follows the same methodology: USD scene composition, physical property assignment, sensor configuration, domain randomization, and automated validation. The specifics vary by domain. The process does not.

If you are building a Physical AI system and need training data that transfers to the real world, the conversation starts with your deployment environment. Tell us where your model will operate, what sensors it will use, and what objects it needs to detect. We will build the Digital Twin and generate the data.

Share