Diffusion 3D Features (Diff3F)

Decorating Untextured Shapes with Distilled Semantic Features

[CVPR 2024]

Niladri Shekhar Dutt^{1, 2}, Sanjeev Muralikrishnan¹, Niloy J. Mitra^{1, 3}

¹ University College London

² Ready Player Me

³ Adobe Research

Diff3F is a a novel feature distiller that harnesses the expressive power of in-painting diffusion features and distills them to points on 3D surfaces. Here, the proposed features are employed for point-to-point shape correspondence between assets varying in shape, pose, species and topology. We achieve this without any fine-tuning of the underlying diffusion models, and demonstrate results on untextured meshes, point clouds, and raw scans. Note that we show raw point-to-point correspondence, without any regularization or smoothing. Inputs are point clouds, non-manifold meshes, or 2-manifold meshes. The left most mesh is the source and all remaining 3D shapes are targets. Corresponding points are similarly colored.

Abstract

We present Diff3F as a simple, robust, and class-agnostic feature descriptor that can be computed for untextured input shapes (meshes or point clouds). Our method distills diffusion features from image foundational models onto input shapes. Specifically, we use the input shapes to produce depth and normal maps as guidance for conditional image synthesis, and in the process produce (diffusion) features in 2D that we subsequently lift and aggregate on the original surface. Our key observation is that even if the conditional image generations obtained from multi-view rendering of the input shapes are inconsistent, the associated image features are robust and can be directly aggregated across views. This produces semantic features on the input shapes, without requiring additional data or training. We perform extensive experiments on multiple benchmarks (SHREC'19, SHREC'20, and TOSCA) and demonstrate that our features, being semantic instead of geometric, produce reliable correspondence across both isometeric and non-isometrically related shape families.

Method → Dataset ↓	DPC [2]		SE-ORNet [3]		3DCODED [4]		FM [5]+WKS [6]		Diff3F (ours)		Diff3F (ours)+FM[5]
Method → Dataset ↓	acc ↑	err ↓	acc ↑	err ↓	acc ↑	err ↓	acc ↑	err ↓	acc ↑	err ↓	acc ↑	err ↓
TOSCA [7]	30.79	3.74	33.25	4.32	0.5*	19.2*	✘		20.27	5.69	✘
SHREC'19 [8]	17.40	6.26	21.41	4.56	2.10	8.10	4.37	3.26	26.41	1.69	21.55	1.49
SHREC'20 [9]	31.08	2.13	31.70	1.00	✘		4.13	7.29	72.60	0.93	62.34	0.71

Train	Method	TOSCA [7]		SHREC'19 [8]		SHREC'20 [9]
Train	Method	acc ↑	err ↓	acc ↑	err ↓	acc ↑	err ↓
SURREAL	DPC [2]	29.30	5.25	17.40	6.26	31.08	2.13
SURREAL	SE-ORNET [3]	16.71	9.19	21.41	4.56	31.70	1.00
SMAL	DPC [2]	30.28	6.43	12.34	8.01	24.5*	7.5*
SMAL	SE-ORNET [3]	31.59	4.76	12.49	9.87	25.4*	2.9*
Pretrained	Diff3F	20.27	5.69	26.41	1.69	72.60	0.93

Diffusion 3D Features (Diff3F)

¹ University College London

² Ready Player Me

³ Adobe Research

Abstract

Results Gallery

How Does it Work?

Semantic Understanding

Zero-Shot Part segmentation

Regularizing Point-to-Point Maps

Comparison to Prior Works

Evaluation Comparison

Generalization

References

BibTeX

Diffusion 3D Features (Diff3F)

1 University College London

2 Ready Player Me

3 Adobe Research

Abstract

Results Gallery

How Does it Work?

Semantic Understanding

Zero-Shot Part segmentation

Regularizing Point-to-Point Maps

Comparison to Prior Works

Evaluation Comparison

Generalization

References

BibTeX

¹ University College London

² Ready Player Me

³ Adobe Research