Computer Science Master's Project · University of Hawaiʻi at Mānoa · 2026

Comparison of State-of-the-Art
SfM 3D Reconstruction Methods

A quantitative evaluation of five SOTA SfM and monocular depth methods on indoor iPhone-captured scenes, measured via ICP-aligned Cloud-to-Mesh distance metrics against stereo ground truth.

Structure from Motion VGGT Pi3 DepthAnything3 COLMAP

Methods evaluated

AI-based

VGGT

Feed-forward transformer for multi-view 3D reconstruction without iterative optimization.

AI-based

Pi3

Point map regression model for dense scene geometry from unposed image collections.

monocular depth

DepthAnything3

Foundation model for monocular depth estimation lifted to full 3D point clouds.

classical

COLMAP

SIFT-based SfM pipeline with MVS densification. The established classical baseline.

Dataset & pipeline

Indoor iPhone scenes

Four objects filmed with an iPhone 15 Pro under day and night lighting (~90 frames/scene at 3 fps). Ground truth from a ORBBEC Gemini 335Lg stereo camera (ROS1 .BAG).

Ukulele Day Ukulele Night Artificial Plant Day Electric Fan Day Lab Chair

Evaluation pipeline

Scene gallery

Ukulele — Day

Input & Ground Truth

Input Video

Stereo Depth Ground Truth

CloudCompare results

Ground Truth

VGGT

Pi3

Depth Anything 3

COLMAP

Ukulele — Night

Input & Ground Truth

Input Video

Stereo Depth Ground Truth

CloudCompare results

Ground Truth

VGGT

Pi3

Depth Anything 3

COLMAP

Artificial Plant - Day

Input & Ground Truth

Input Video

Stereo Depth Ground Truth

CloudCompare results

Ground Truth

VGGT

Pi3

Depth Anything 3

COLMAP

Electric Fan — Day

Input & Ground Truth

Input Video

Stereo Depth Ground Truth

CloudCompare results

Ground Truth

VGGT

Pi3

Depth Anything 3

COLMAP

Lab Chair

Input & Ground Truth

Input Video

Stereo Depth Ground Truth

CloudCompare results

Ground Truth

VGGT

Pi3

Depth Anything 3

COLMAP

Mannequin Head(qualitative only)

Input & Ground Truth

Input Video

Stereo Depth Ground Truth

CloudCompare results

VGGT

Pi3

Depth Anything 3

COLMAP

Results

Method	Scene	Chamfer ↓	Hausdorff ↓	Accuracy ↑	Completeness ↑	F-Score ↑	Time ↓
VGGT	Ukulele Day	10.79	1163.56	88.39%	65.93%	75.53%	19.19 s
Pi3	Ukulele Day	5.18	423.73	91.42%	62.23%	74.05%	35.26 s
DepthAnything3	Ukulele Day	10.86	268.15	88.33%	66.53%	75.90%	2.25 s
COLMAP	Ukulele Day	5.52	464.69	91.34%	71.2%	80.02%	24.802 min
VGGT	Ukulele Night	7.50	1314.51	91.17%	71.51%	80.15%	15.36 s
Pi3	Ukulele Night	23.02	1216.95	93.01%	69.49%	79.55%	23.42 s
DepthAnything3	Ukulele Night	0.17	207.94	98.69%	45.04%	61.85%	1.27 s
COLMAP	Ukulele Night	3.27	658.20	87.38%	80.01%	83.54%	24.387 min
VGGT	Artificial Plant Day	10.00	765.90	77.4%	46.61%	58.18%	13.85 s
Pi3	Artificial Plant Day	13.05	688.79	78.16%	49.75%	60.80%	21.02 s
DepthAnything3	Artificial Plant Day	0.72	132.17	98.85%	46.63%	63.37%	2.99 s
COLMAP	Artificial Plant Day	9.28	572.12	78.23%	60.8%	68.42%	21.593 min
VGGT	Electric Fan Day	2.11	1485.46	86.66%	82.11%	84.32%	20.66 s
Pi3	Electric Fan Day	6.08	1678.49	81.35%	91.46%	86.11%	34.88 s
DepthAnything3	Electric Fan Day	1.20	228.92	95.98%	39.99%	56.46%	3.42 s
COLMAP	Electric Fan Day	8.50	1277.66	80.08%	84.59%	82.27%	31.008 min
VGGT	Lab Chair	8.99	1554.96	41.69%	34.81%	37.94%	19.77 s
Pi3	Lab Chair	12.81	1116.10	23.99%	23.35%	23.66%	31.69 s
DepthAnything3	Lab Chair	19.92	534.96	64.23%	36.42%	46.48%	5.17 s
COLMAP	Lab Chair	13.12	943.01	58.72%	64.19%	61.34%	29.509 min

↓ lower is better · ↑ higher is better · distances in millimeters · Mannequin Head: qualitative only

Key findings

AI-based methods (VGGT, Pi3) reconstruct dense geometry without explicit camera calibration, trading classical robustness for significant speed gains.

COLMAP failed sparse reconstruction on the Mannequin Head scene — insufficient SIFT features on the low-texture surface. Treated as a qualitative finding.

Lighting conditions (day vs. night) had a measurable impact on reconstruction quality across all methods tested.

Why does this matter?

✦

Autonomous vehicles use 3D reconstruction to map environments in real time.

✦

Surgical robotics and medical imaging use 3D models for precision navigation.

✦

AR/VR and spatial computing require fast, accurate reconstruction to feel real.

✦

Disaster response drones use 3D mapping to assess structural damage.

Comparison of State-of-the-ArtSfM 3D Reconstruction Methods

Indoor iPhone scenes

Evaluation pipeline

Comparison of State-of-the-Art
SfM 3D Reconstruction Methods