联系我们
Isaac Scientific Publishing
Frontiers in Signal Processing
FSP > Volume 3, Number 4, October 2019

Research on Tibetan Document Segmentation Based on GMM+K-means Algorithm

Download PDF  (3174.2 KB)PP. 100-108,  Pub. Date:October 10, 2019
DOI: 10.22606/fsp.2019.34006

Author(s)
Chengliang Jiang, Huazhang Wang
Affiliation(s)
College of Electrical & Information Engineering, Southwest Minzu University, Chengdu, China
College of Electrical & Information Engineering, Southwest Minzu University, Chengdu, China
Abstract
The Tibetan language, as the language of the tubo period, recorded the life, history and other important events of the Tibetan people, and is a treasure of Tibetan culture. Aiming at the problem of the loss of Tibetan documents caused by the yellowing, blackening and rotting of papers due to the old age, a new method for the segmentation of Tibetan documents is proposed. In order to better protect Tibetan documents and reveal the contents recorded in the documents. The method uses the improved NLM (non-local means) algorithm to de-dry the pre-processing, and uses the automatic region-blocking GMM (Gaussian Mixture Model)+K-means multi-feature fusion algorithm to segment, using multi-region classification extraction as post-processing of the Tibetan documents. Experimental results show that compared with k-means, GMM and other algorithms, this method can more effectively segment the text in Tibetan literature, proving the effectiveness and accuracy of this method.
Keywords
Tibetan documents, NLM, GMM, K-means
References
  • [1]  Song R, Zhang Z, Liu H. “Edge connection based Canny edge detection algorithm”. Pattern Recognition & Image Analysis, 2017, 27(4), pp.740-747.
  • [2]  GAO Yong-gang. “An Improved Edge Detection of Roberts Operators”. Journal of Chaohu University, 2009, 11(6), pp.31-32.
  • [3]  Ding K, Xiao L, Weng G. “Active contours driven by region-scalable fitting and optimized Laplacian of Gaussian energy for image segmentation”. Signal Processing, 2017, 134, pp.224-233.
  • [4]  YILIHAMU Yaermaimaiti, “Research on an improved image segmentation algorithm for Uyghur characters”, Modern Electronics Technique, pp.128-131, 2017(04).
  • [5]  Zhang Kaige, Miao Yi, Lei Jiankun, et al. “Extraction of color image texts combining wavelet interpolation and K-means”. Computer Technology and Development, pp.31-33. 2013(3).
  • [6]  Wu Suhui, Cheng Ying, Zheng Yanning, et al. “Survey on K-means Algorithm”, Data Analysis and Knowledge Discovery, pp.28-35,2011, 27(5)
  • [7]  Xiang Rihua, Wang Runsheng. “A Range Image Segmentation Algorithm Based on Gaussian Mixture Model”. Journal of Software, 14(7), pp.1250-1257. 2003.
  • [8]  Luisier F, Blu T. “SURE-LET multichannel image denoising: interscale orthonomal wavelet thresholding “.IEEE Transactions on Image Processing, 2008, 17(4). pp. 482-492.
  • [9]  Donoho D L, Johnstone J M. Ideal spatial adaptation by wavelet shrinkage[J]. Biometrika, 1994, 81(3), pp.425- 455.
  • [10]  Bai J, Feng X C. “Fractional order anisotropic diffusion for image denoising”. IEEE Transactions on Image Pmcessing, 2007, 16(10), pp.2492-2502.
  • [11]  Perona P, Malik J. “scale-space and edge detection using all isotropic diffusion”. IEEE Transactions on Pattem Analysis and Machines Intelligence, 1990, 12(7), pp.629-639.
  • [12]  Buades A, Coll B, Morel J M. “A Non-Local Algorithm for Image Denoising”. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. San Diego, CA, USA:IEEE, 2005, pp60-65.
  • [13]  Zhao Qingping, Chen Debao, Jiang Enhua, et al. “Improved weighted non-local mean algorithm filter for image denoising”, Journal of Electronic Measurement and Instrument, 2014, 28(3), pp.334-339.
Copyright © 2019 Isaac Scientific Publishing Co. All rights reserved.