datadoo
Research · GTC 2026
NVIDIA GPU Technology Conference

Detecting windshield damage with synthetic data

A research poster presented at NVIDIA GTC 2026. How we generate physically accurate synthetic imagery of laminated-glass damage with NVIDIA Replicator and Cosmos-Transfer, and train models that decide repair vs. replace without a single real frame.

JC
MC
DT
Jose Castro · Miguel Clavijo · Diego Torres
datadoo research · March 2026

SDG performance

40%
Faster generation
4.57s
Per sample (datadoo SDG)
1.4×
Iteration speedup
$37–41B
TAM

Contributions

What the paper delivers

01

End-to-end SDG pipeline

Automated synthetic data generation for laminated glass damage with pixel-perfect labels — no real data required.

02

Physics-accurate control

Domain control via NVIDIA Cosmos-Transfer covering weather, optics, curvature, and crack taxonomies at scale.

03

Repair-or-replace decisions

Segmentation output measures damage and position, delivering qualified repair vs. replace decisions for insurers.

The pipeline

From asset to model, end-to-end

An automated loop from procedural asset creation through NVIDIA Omniverse, Cosmos-Transfer domain control, training, and model serving — powered entirely by synthetic data.

  1. 01Procedural asset generation
  2. 02Omniverse synthetic data generation
  3. 03Cosmos-Transfer domain control
  4. 04Training & evaluation pipeline
  5. 05Model serving

Abstract

Closing the Sim2Real gap for laminated glass

Windshield damage accounts for 30% of all auto-insurance claims inside a global industry worth roughly $37–41 billion. Real-world data for this domain is scarce and difficult to label because transparent and reflective surfaces are notoriously hard to capture, annotate, and validate.

We author scenes in USD and programmatically vary illumination, weather, camera pose, lens model, shutter, sensor noise, glass curvature, lamination, and damage taxonomy. NVIDIA Replicator produces RGB, depth, normals, instance masks, and metadata for automatic labeling. NVIDIA Cosmos-Transfer acts as a physical AI foundation layer that augments variability in a consistent way, giving us scalable datasets covering specific environments and edge cases via natural language as an extra input.

We trained a semantic segmentation model on an Omniverse synthetically generated dataset, bridging the Sim2Real gap without real data. Iteration during development was accelerated by 1.4× thanks to quick SDG turnaround, and our reworked stage randomization logic on top of Replicator achieves ~40% faster generation while preserving quality with the RTX Real-Time 2.0 rendering engine.

The resulting model provides powerful reporting capabilities with a segmentation mask — measuring the damage and its position on the windshield, and issuing a fully “notarized” repair vs. replace decision that serves insurance companies as a validation comparable to the valuator output.

Talk to research

Working on Physical AI?

We build synthetic datasets that obey real-world physics. Talk to our research team about your use case — robotics, insurance, autonomous vehicles, or beyond.

NVIDIA and the NVIDIA logo are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries.