π ∑ ∫ ∞ ∇ Δ ∂ √ ⊕ ◯

World Reconstruction for ML-based Construction Analytics

For the Hacktech x Ironsite Challenge, we experimented with, iterated over, and compiled use cases based on recent computer vision advancements.

Agentic 3D scene navigation

Upload any video → feed-forward 3D reconstruction (Pi³ / VGGT) → ask anything about the scene in natural language → watch a tool-using vLLM agent navigate the cloud, render evidence views, and ground its answer with a bbox on the best source frame.

Pi³ + VGGT OWLv2 + SigLIP Gemma 4 (multimodal) 12 spatial tools

▶ Open demo 📄 Report

Photo: C Dustin / Unsplash

Ironworld scene memory

Ask questions about narrated construction site footage and get automated safety analysis

Live 14 scenes 32 OSHA citations

▶ Video demos 📄 Report

— — — Additional Experiments — — —

HandClust · egocentric task discovery

Automatically discovers and groups similar work activities from first-person construction video. Per-clip activity clustering with WiLoR-augmented hand-skeleton overlay, Gemini cluster labels, and a 3-D embedding view.

5 clips 1800 segments 21-47% hand cov

▶ Video demos 📄 Report

Hand + tool segmentation

Zero-shot text-prompt segmentation of hands (gloved or bare), PPE, and held tools on construction footage. Compares SAM 3.1 image, SAM 3.1 multiplex video tracker, OWLv2 → SAM 2.1, and MediaPipe → SAM 2.1.

SAM 3.1 multiplex 50 frames 6 s tracking clip

▶ Open viewer

Movement clustering · 6-modality ablation

When the hand is gone for 70-90% of frames, what else can we cluster on? Side-by-side comparison of V-JEPA, OWLv2 tool-presence, optical-flow rhythm, ego-motion, body-pose, and a late-fusion baseline across all 5 featured demo clips.

6 modalities 5 clips RepNet stand-in

▶ Open viewer

3D-recon shootout · VGGT vs alternatives

Same 32 frames per clip, fed through ~20 feed-forward / SLAM / SfM / Gaussian / mono-depth methods. Side-by-side point clouds, runtimes, and quality. The headline question: is anything actually better than VGGT for ironsite footage?

3 clips ~20 methods Plotly 3D

▶ Open viewer