A photo archive with 12M+ uncategorized portraits required identity grouping, but manual tagging was infeasible. Variations in pose, lighting, and image quality made clustering difficult.
Trained a face recognition model on historical photos to generate embeddings, then used hierarchical clustering with optimized distance thresholds. Incorporated manual review loops for ambiguous cases.