Computer Science Master's Project  ·  University of Hawaiʻi at Mānoa  ·  2026

Comparison of State-of-the-Art
SfM 3D Reconstruction Methods

A quantitative evaluation of five SOTA SfM and monocular depth methods on indoor iPhone-captured scenes, measured via ICP-aligned Cloud-to-Mesh distance metrics against stereo ground truth.

Structure from Motion VGGT Pi3 DepthAnything3 COLMAP
Methods evaluated
AI-based
VGGT
Feed-forward transformer for multi-view 3D reconstruction without iterative optimization.
AI-based
Pi3
Point map regression model for dense scene geometry from unposed image collections.
monocular depth
DepthAnything3
Foundation model for monocular depth estimation lifted to full 3D point clouds.
classical
COLMAP
SIFT-based SfM pipeline with MVS densification. The established classical baseline.
Dataset & pipeline

Indoor iPhone scenes

Four objects filmed with an iPhone 15 Pro under day and night lighting (~90 frames/scene at 3 fps). Ground truth from a ORBBEC Gemini 335Lg stereo camera (ROS1 .BAG).

Methods overview
Ukulele Day Ukulele Night Artificial Plant Day Electric Fan Day Lab Chair

Evaluation pipeline

Evaluation pipeline diagram
Results
MethodSceneChamfer ↓Hausdorff ↓Accuracy ↑Completeness ↑F-Score ↑Time ↓
VGGTUkulele Day10.791163.5688.39%65.93%75.53%19.19 s
Pi3Ukulele Day5.18423.7391.42%62.23%74.05%35.26 s
DepthAnything3Ukulele Day10.86268.1588.33%66.53%75.90%2.25 s
COLMAPUkulele Day5.52464.6991.34%71.2%80.02%24.802 min
VGGTUkulele Night7.501314.5191.17%71.51%80.15%15.36 s
Pi3Ukulele Night23.021216.9593.01%69.49%79.55%23.42 s
DepthAnything3Ukulele Night0.17207.9498.69%45.04%61.85%1.27 s
COLMAPUkulele Night3.27658.2087.38%80.01%83.54%24.387 min
VGGTArtificial Plant Day10.00765.9077.4%46.61%58.18%13.85 s
Pi3Artificial Plant Day13.05688.7978.16%49.75%60.80%21.02 s
DepthAnything3Artificial Plant Day0.72132.1798.85%46.63%63.37%2.99 s
COLMAPArtificial Plant Day9.28572.1278.23%60.8%68.42%21.593 min
VGGTElectric Fan Day2.111485.4686.66%82.11%84.32%20.66 s
Pi3Electric Fan Day6.081678.4981.35%91.46%86.11%34.88 s
DepthAnything3Electric Fan Day1.20228.9295.98%39.99%56.46%3.42 s
COLMAPElectric Fan Day8.501277.6680.08%84.59%82.27%31.008 min
VGGTLab Chair8.991554.9641.69%34.81%37.94%19.77 s
Pi3Lab Chair12.811116.1023.99%23.35%23.66%31.69 s
DepthAnything3Lab Chair19.92534.9664.23%36.42%46.48%5.17 s
COLMAPLab Chair13.12943.0158.72%64.19%61.34%29.509 min

↓ lower is better  ·  ↑ higher is better  ·  distances in millimeters  ·  Mannequin Head: qualitative only

Key findings
01
AI-based methods (VGGT, Pi3) reconstruct dense geometry without explicit camera calibration, trading classical robustness for significant speed gains.
02
COLMAP failed sparse reconstruction on the Mannequin Head scene — insufficient SIFT features on the low-texture surface. Treated as a qualitative finding.
03
Lighting conditions (day vs. night) had a measurable impact on reconstruction quality across all methods tested.
Challenges
VGGT — Mannequin Head
VGGT Mannequin Head
Pi3 — Campus Center Bench
Pi3 Campus Center Bench
Why does this matter?
Autonomous vehicles use 3D reconstruction to map environments in real time.
Surgical robotics and medical imaging use 3D models for precision navigation.
AR/VR and spatial computing require fast, accurate reconstruction to feel real.
Disaster response drones use 3D mapping to assess structural damage.
Paper
DOWNLOAD PAPER
Poster
DOWNLOAD POSTER VIEW ON GITHUB