Large-Scale Face Clustering for Photo Archives

The Challenge

A photo archive with 12M+ uncategorized portraits required identity grouping, but manual tagging was infeasible. Variations in pose, lighting, and image quality made clustering difficult.

Large-Scale Face Clustering for Photo Archives

Our Solution

Trained a face recognition model on historical photos to generate embeddings, then used hierarchical clustering with optimized distance thresholds. Incorporated manual review loops for ambiguous cases.

Technologies Used

FaceNet FAISS PyTorch Hierarchical Clustering

Results & Impact

  • Clustered 12M+ photos into ~800K distinct identities
  • Achieved 89% precision/recall on identity grouping
  • Reduced manual tagging effort by 90%