READABLE TEXT RETRIEVAL FROM NOISE-INFLUENCED DOCUMENTS USING IMAGE RESTORATION METHODS
Keywords:
Document Scanning, Noise Reduction, Adaptive Gaussian, Signal Ratio, Quality Enhancement, Restoration TechniquesAbstract
Documents scanning has become a necessary phase in official record keeping of everyday business environment. Typically, scanned document images in digitized format suffer from various types of noise which create serious problems at document reading time. This noise may be due to several reasons low quality paper, paper aging, scanner assembly and tonner, unskilled machine operator, or due to some copying machine artifacts. The removal or elimination of noise in scanned documents is still a big challenge for researchers in the digital era. Already performed work on digitized handwritten, and machine-printed degraded historical documents, but we have experimented with different datasets such as the Media Team Document Database manually scanned noisy documents, and decided to use, a collection of scanned noise-affected documents, which are available on the websites. We have transformed the noise-influenced image document into a binarized document. After this, we applied noise reduction techniques for textual data enhancement so that the text would be in readable and noise-free form. An Adaptive Gaussian Mixture Model based on Expectation Maximization (EM) has been used to restore the image pixels, with the values expected to be the original ones. The enhanced text in its visual aspect and improved quantitatively measured parameters show the restored documents. We have calculated Signal to Noise Ratio (SNR), Mean Square Error / Mean Square Difference (MSE/MSD), Peak Signal Noise Ratio (PSNR), Contrast, and Energy for quantitative parameters to evaluate the performance measures of the proposed method of document restoration comparative to the state-of-the-art methods. Our research is quantitative, as we have performed experiments on digital sensor data and the evaluation of the results based on computational techniques. Our results are successful, support the proposed methodology, and perform well in comparison to the state-of-the-art methods. Overall, the proposed methodology is easy to understand and simple to implement.