Recon-method comparison

3D-recon shootout · VGGT vs 5 alternatives

Same 32 frames per clip → fed through 6 feed-forward / SLAM / SfM / Gaussian / mono-depth methods. Point clouds shown below are voxel-downsampled to ~60K points for browser rendering — the underlying outputs are typically 1-10M points each. All methods consume the identical input set so the comparison is apples-to-apples.

Clips: scene_01· 32 frames · 720×540 · 60s scene_02· 32 frames · 640×480 · 60s scene_03· 32 frames · 640×480 · 60s scene_04· 32 frames · 640×480 · 60s scene_05· 32 frames · 640×480 · 60s scene_06· 32 frames · 820×616 · 60s

Clip scene_01 scene_02 scene_03 scene_04 scene_05 scene_06

Layout single 2-up 3-up

Points

Color RGB depth method

Source video — scene_01

Loading methods…