Synthgen Logo
Research

Does synthetic data
actually work?

We measured it - paired against real-data-only baselines, 7 to 15 seeds, on a frozen test set, across 8 industries and three imaging types, from color photos to X-ray.

The result

Every hardest class moved.

On the single hardest, most starved class in each dataset, adding Synthgen synthetic data to the same real data lifted accuracy by +9 to +35 points.

Real data only + Synthgen
Pill
Pharmaceutical · faulty imprint
46% → 81%
+34.8
pp · z 5.6
Magnetic Tile
Electronics · blowhole
56% → 85%
+29.1
pp · z 13.2
Marble
Stone · crack
51% → 76%
+24.8
pp · z 7.5
Capsule
Pharmaceutical · crack
38% → 62%
+23.3
pp · z 7.1
Zipper
Textile · broken teeth
63% → 84%
+21.0
pp · z 3.4
RIAWELC
Welding NDT · porosity
47% → 66%
+19.1
pp · z 5.2
Metal Nut
Metal hardware · scratch
71% → 87%
+15.9
pp · z 3.8
PlantVillage
Agriculture · late blight
50% → 64%
+13.3
pp · z 4.2
Wood
Building materials · hole
91% → 99%
+9.3
pp · z 3.5

Per-class accuracy on the hardest class in each dataset. Paired vs real-only, 7-15 seeds, frozen test split, placebo-controlled. z = standard deviations above the paired baseline.

Data efficiency

The less real data you have, the more synthetic data pays.

With Synthgen, two labelled examples per class reach the same accuracy as five real ones alone - 60% fewer labels. The gap is widest exactly when your real data is scarcest.

Real data only + Synthgen
20%30%40%50%60%235real labelled images per classaccuracysame accuracy, 60% fewer labels+18pp

MVTec Pill, accuracy vs real labelled images per class, paired. With + Synthgen, 2 labels per class (44.6%) reach the accuracy of 5 real labels alone (45.4%) - and the lift is largest exactly when real data is scarcest.

Every hardest-class gain held up paired over 7 to 15 random seeds, on a frozen test set we never touched during training.

Synthgen internal benchmark

Where it fits

Strongest where you're weakest.

Synthetic data pays exactly where real data is scarce and the class is hard - the rare, costly classes you can't collect enough of.

Rare, hard classes

The appearance-based classes starved of real examples - exactly where models fail and where it costs you most.

Few-shot and cold start

The less real data you have, the more synthetic data is worth - a working model before you have collected hundreds of examples.

Weak-spot localization

Pinpointing where it is in the image, on exactly the starved cases the real-only model localizes worst.

Run it on your data

See the lift on your hardest class.

Bring your most-confused, least-labelled class. We will show you the gain on a pilot.