Schedule Consultation
Back to Case Studies

Robust Page Detection for Scanned Historical Books

Rare Books Digital Library October 2023
U-Net Semantic Segmentation TensorFlow Transfer Learning OpenCV Morphological Operations
Robust Page Detection for Scanned Historical Books - Main project visualization showing Automated cropping of historical book scans was highly unreliable, failing 30% of the time due to wa

The Challenge

Automated cropping of historical book scans was highly unreliable, failing 30% of the time due to warped pages and shadows. This required constant manual intervention, stalling their entire digitization workflow.

Our Solution

Trained a U-Net model to predict page masks on noisy image scans across a variety of content collections. Used transfer learning from a pre-trained model and augmented training data with synthetic distortions. Implemented post-processing with morphological operations and contour refinement for precise boundary extraction.

Results & Impact

Removed the need for time-consuming manual page cropping fixes

Tighter crops improved downstream OCR accuracy across multiple client collections

Ready to Transform Your Business?

Let's discuss how we can help you achieve similar results.

Schedule a Consultation