Robust Page Detection for Scanned Historical Books
The Challenge
Automated cropping of historical book scans was highly unreliable, failing 30% of the time due to warped pages and shadows. This required constant manual intervention, stalling their entire digitization workflow.
Our Solution
Trained a U-Net model to predict page masks on noisy image scans across a variety of content collections. Used transfer learning from a pre-trained model and augmented training data with synthetic distortions. Implemented post-processing with morphological operations and contour refinement for precise boundary extraction.
Project Gallery
Results & Impact
Removed the need for time-consuming manual page cropping fixes
Tighter crops improved downstream OCR accuracy across multiple client collections
Ready to Transform Your Business?
Let's discuss how we can help you achieve similar results.
Schedule a Consultation