Same 32 frames per clip → fed through 6 feed-forward / SLAM / SfM / Gaussian / mono-depth methods. Point clouds shown below are voxel-downsampled to ~60K points for browser rendering — the underlying outputs are typically 1-10M points each. All methods consume the identical input set so the comparison is apples-to-apples.
Clips:
scene_01·
32 frames · 720×540 · 60s
scene_02·
32 frames · 640×480 · 60s
scene_03·
32 frames · 640×480 · 60s
scene_04·
32 frames · 640×480 · 60s
scene_05·
32 frames · 640×480 · 60s
scene_06·
32 frames · 820×616 · 60s
scene_01Loading methods…