Document Type

Thesis - Open Access

Award Date

2024

Degree Name

Master of Science (MS)

Department / School

Electrical Engineering and Computer Science

First Advisor

Chulwoo Pack

Abstract

Documents often suffer from various types of degradation which make them difficult to read and restrict OCR performance. This study investigates the effectiveness of perceptual loss in enhancing document image cleanup by comparing a GAN-based model and a diffusion model. In our experiments, we utilized the DE-GAN model as a GAN-based model and the NAF-DPM model as a diffusion model, both enhanced by incorporating perceptual loss. We then compared the results of both models and evaluated them by using the DIBCO 2013, DIBCO 2017, and H-DIBCO 2018 datasets revealed that our approach consistently outperforms existing state-of-the-art methods. Results showed that incorporating perceptual loss significantly improves the visual quality and readability of the enhanced documents, leading to better OCR performance and higher evaluation metrics, including the Structural Similarity Index (SSIM), which assesses the preservation of structural information in images based on human visual perception principles, across all tested datasets, While Peak Signal-to-Noise Ratio (PSNR) is a common metric for pixel-level fidelity, its reliability varies with content changes, as noted by Huynh-Thu and Ghanbari (2008)[45]. Following SSIM, we also considered Peak Signal-to-Noise Ratio (PSNR), which primarily assesses pixel-level fidelity. PSNR indirectly relates to perceptual quality by measuring how closely the enhanced document retains its original pixel details and reduces noise, particularly when the document content, such as text and layout, is consistent in the images . Additionally, we evaluated both the F-measure and the pseudo-F-measure (Fps), as mentioned by Vlăsceanu et al. [52] . While conventional metrics like the F-measure are widely adopted to assess pixel accuracy, they may not fully capture the quality of structural preservation, which is critical in tasks such as OCR. In contrast, we also used the Pseudo-F-Measure, particularly when combined with skeletonization, as it is better suited for evaluating how well thin structures and text strokes are preserved during the binarization process. When incorporating perceptual loss into the binarization task, the focus shifts toward maintaining visual coherence at a structural level, aligning with the objectives of the Pseudo-F-Measure. Given the need for high-quality text extraction and structural preservation in OCR and DIBCO-style evaluations, the Pseudo-F-Measure becomes an essential metric for assessing the impact of perceptual loss on document binarization quality.

Publisher

South Dakota State University

Share

COinS
 

Rights Statement

In Copyright