Schedule Consultation
Back to Case Studies

Large-Scale Face Clustering for Photo Archives

Heritage Photos Inc. January 2024
FaceNet FAISS PyTorch Hierarchical Clustering Active Learning Vector Search
Large-Scale Face Clustering for Photo Archives - Main project visualization showing A massive archive of 12 million portraits was essentially a "digital shoebox," impossible to search

The Challenge

A massive archive of 12 million portraits was essentially a "digital shoebox," impossible to search or navigate by person. Manual tagging was financially out of the question, and off-the-shelf facial recognition failed on the low-quality, historical images.

Our Solution

Fine-tuned a FaceNet model on historical photos to generate 128-dimensional embeddings optimized for vintage photography. Used FAISS for efficient similarity search and hierarchical clustering with optimized distance thresholds. Incorporated active learning loops for ambiguous cases and implemented a human-in-the-loop review system.

Results & Impact

Clustered 12M+ photos into ~800K distinct identities

Achieved 89% precision/recall on identity grouping

Reduced manual tagging effort by 90%

Enabled family history research at unprecedented scale

Ready to Transform Your Business?

Let's discuss how we can help you achieve similar results.

Schedule a Consultation